SlideShare una empresa de Scribd logo
1 de 122
SW388R7

                  Discriminant Analysis – Basic Relationships
Data Analysis &
 Computers II

    Slide 1




                          Discriminant Functions and Scores

                              Describing Relationships

                               Classification Accuracy

                                  Sample Problems
SW388R7

                                     Discriminant analysis
Data Analysis &
 Computers II

    Slide 2




                     Discriminant analysis is used to analyze relationships between a
                      non-metric dependent variable and metric or dichotomous
                      independent variables.

                     Discriminant analysis attempts to use the independent variables
                      to distinguish among the groups or categories of the dependent
                      variable.

                     The usefulness of a discriminant model is based upon its
                      accuracy rate, or ability to predict the known group
                      memberships in the categories of the dependent variable.
SW388R7

                                      Discriminant scores
Data Analysis &
 Computers II

    Slide 3



                     Discriminant analysis works by creating a new variable called
                      the discriminant function score which is used to predict to
                      which group a case belongs.
                     Discriminant function scores are computed similarly to factor
                      scores, i.e. using eigenvalues. The computations find the
                      coefficients for the independent variables that maximize the
                      measure of distance between the groups defined by the
                      dependent variable.
                     The discriminant function is similar to a regression equation in
                      which the independent variables are multiplied by coefficients
                      and summed to produce a score.
SW388R7

                                    Discriminant functions
Data Analysis &
 Computers II

    Slide 4




                     Conceptually, we can think of the discriminant function or
                      equation as defining the boundary between groups.

                     Discriminant scores are standardized, so that if the score falls
                      on one side of the boundary (standard score less than zero, the
                      case is predicted to be a member of one group) and if the score
                      falls on the other side of the boundary (positive standard
                      score), it is predicted to be a member of the other group.
SW388R7

                                      Number of functions
Data Analysis &
 Computers II

    Slide 5




                     If the dependent variable defines two groups, one statistically
                      significant discriminant function is required to distinguish the
                      groups; if the dependent variable defines three groups, two
                      statistically significant discriminant functions are required to
                      distinguish among the three groups; etc.

                     If a discriminant function is able to distinguish among groups, it
                      must have a strong relationship to at least one of the
                      independent variables.

                     The number of possible discriminant functions in an analysis is
                      limited to the smaller of the number of independent variables
                      or one less than the number of groups defined by the
                      dependent variable.
SW388R7

                                 Overall test of relationship
Data Analysis &
 Computers II

    Slide 6




                     The overall test of relationship among the independent
                      variables and groups defined by the dependent variable is a
                      series of tests that each of the functions needed to distinguish
                      among the groups is statistically significant.

                     In some analyses, we might discover that two or more of the
                      groups defined by the dependent variable cannot be
                      distinguished using the available independent variables. While
                      it is reasonable to interpret a solution in which there are fewer
                      significant discriminant functions than the maximum number
                      possible, our problems will require that all of the possible
                      discriminant functions be significant.
SW388R7
Data Analysis &          Interpreting the relationship between
                         independent and dependent variables
 Computers II

    Slide 7



                     The interpretative statement about the relationship between
                      the independent variable and the dependent variable is a
                      statement like: cases in group A tended to have higher scores
                      on variable X than cases in group B or group C.

                     This interpretation is complicated by the fact that the
                      relationship is not direct, but operates through the discriminant
                      function.

                     Dependent variable groups are distinguished by scores on
                      discriminant functions, not on values of independent variables.
                      The scores on functions are based on the values of the
                      independent variables that are multiplied by the function
                      coefficients.
SW388R7

                             Groups, functions, and variables
Data Analysis &
 Computers II

    Slide 8




                     To interpret the relationship between an independent variable
                      and the dependent variable, we must first identify how the
                      discriminant functions separate the groups, and then the role of
                      the independent variable is for each function.

                     SPSS provides a table called "Functions at Group Centroids"
                      (multivariate means) that indicates which groups are separated
                      by which functions.

                     SPSS provides another table called the "Structure Matrix" which,
                      like its counterpart in factor analysis, identifies the loading, or
                      correlation, between each independent variable and each
                      function. This tells us which variables to interpret for each
                      function. Each variable is interpreted on the function that it
                      loads most highly on.
SW388R7

                                       Functions at Group Centroids
Data Analysis &
 Computers II

    Slide 9




                   In order to specify the role that each independent
                   variable plays in predicting group membership on the
                   dependent variable, we must link together the
                   relationship between the discriminant functions and the
                   groups defined by the dependent variable, the role of
                   the significant independent variables in the
                   discriminant functions, and the differences in group
                   means for each of the variables.


                                                                                          Function 2 separates
                                                    Functions at Group Centroids          survey respondents
                                                                                          who thought we spend
                                                                      Function            too little money on
                                                                                          welfare (positive value
                                                  WELFARE          1             2
                                                                                          of 0.235) from survey
                                                  1                -.220          .235    respondents who
                                                  2                  .446        -.031    thought we spend too
                                                  3                -.311         -.362    much money (negative
                                                                                          value of -0.362) on
                                                  Unstandardized canonical discriminant   welfare. We ignore the
                                                  functions evaluated at group means      second group (-0.031)
                  Function 1 separates survey respondents                                 in this comparison
                  who thought we spend about the right                                    because it was
                  amount of money on welfare (the positive                                distinguished from the
                  value of 0.446) from survey respondents                                 other two groups by
                  who thought we spend too much (negative                                 function 1.
                  value of -0.311) or little money (negative
                  value of -0.220) on welfare.
SW388R7

                                                   Structure Matrix
Data Analysis &
 Computers II

   Slide 10




              Based on the structure matrix, the                                                We do not interpret
              predictor variables strongly associated with                                      loadings in the
              discriminant function 1 which distinguished                                       structure matrix unless
              between survey respondents who thought                                            they are 0.30 or higher.
              we spend about the right amount of money
              on welfare and survey respondents who
              thought we spend too much or little money
              on welfare were number of hours worked in                  Structure Matrix
              the past week (r=-0.582) and highest year
              of school completed (r=0.687).                                                  Function
                                                                                         1               2
                                                        HIGHEST YEAR OF
                                                                                             .687*           .136
                                                        SCHOOL COMPLETED
                                                        NUMBER OF HOURS
                                                                                          -.582*             .345
                                                        WORKED LAST WEEK
                                                        R SELF-EMP OR WORKS
                                                                                             .223            .889*
                                                        FOR SOMEBODY
                                                        RESPONDENTS INCOMEa                  .101            .292*
                                                      Pooled within-groups correlations between discriminating
                                                      variables and standardized canonical discriminant functions
                                                      Variables ordered by absolute size of correlation within function.
                                           Based on the *. Largest absolute correlation between each variable and
                                                          structure matrix, the predictor variable
                                           strongly associated with discriminant function 2 which
                                                           any discriminant function
                                           distinguished a.
                                                          between survey respondents who thought we
                                           spend too little This variable not used inand analysis. respondents
                                                            money on welfare the survey
                                           who thought we spend too much money on welfare was
                                           self-employment (r=0.889).
SW388R7

                                                  Group Statistics
Data Analysis &
 Computers II

   Slide 11



                                           Group Statistics

                                                                                  Valid N (listwise)
           WELFARE                                Mean        Std. Deviation   Unweighted      Weighted
           1 TOO LITTLE    NUMBER OF HOURS                                 The average number of hours worked
                                                    43.96            13.240
                           WORKED LAST WEEK                                in the past week56.000
                                                                                     56
                                                                                              for survey
                           HIGHEST YEAR OF                                 respondents who thought we spend
                                                    13.73             2.401about the 56
                                                                                      right amount of money on
                                                                                             56.000
                           SCHOOL COMPLETED
                                                                           welfare (mean=37.90) was lower than
                           R SELF-EMP OR WORKS
                                                     1.93              .260the average number of hours worked
                                                                                     56      56.000
                           FOR SOMEBODY                                    in the past weeks for survey
                           RESPONDENTS INCOME       13.70             5.034respondents who thought we spend too
                                                                                     56      56.000
           2 ABOUT RIGHT   NUMBER OF HOURS                                 much money on welfare (mean=43.96)
                                                    37.90            13.235and survey respondents who thought
                                                                                     50      50.000
                           WORKED LAST WEEK
                           HIGHEST YEAR OF                                 we spend too little money on welfare
                                                    14.78             2.558(mean=42.03). 50.000
                                                                                     50
                           SCHOOL COMPLETED
                           R SELF-EMP OR WORKS
                                                     1.90              .303This enables us to make the
                                                                                    50     50.000
                           FOR SOMEBODY                                    statement: "survey respondents who
                           RESPONDENTS INCOME       14.00             5.503thought we spend about the right
                                                                                    50    50.000
           3 TOO MUCH      NUMBER OF HOURS                                 amount of money on welfare worked
                                                    42.03            10.456fewer hours in the past week than
                                                                                    32     32.000
                           WORKED LAST WEEK
                           HIGHEST YEAR OF                                 survey respondents who thought we
                                                    13.38             2.524spend too32
                                                                                     much 32.000
                                                                                          or little money on
                           SCHOOL COMPLETED
                                                                           welfare."
                           R SELF-EMP OR WORKS
                                                     1.75              .440             32      32.000
                           FOR SOMEBODY
                           RESPONDENTS INCOME       14.75             5.304             32      32.000
           Total           NUMBER OF HOURS
                                                    41.32            12.846            138     138.000
                           WORKED LAST WEEK
                           HIGHEST YEAR OF
                                                    14.03             2.537            138     138.000
                           SCHOOL COMPLETED
                           R SELF-EMP OR WORKS
SW388R7

                       Which independent variables to interpret
Data Analysis &
 Computers II

   Slide 12




                     In a simultaneous discriminant analysis, in which all
                      independent variables are entered together, we only interpret
                      the relationships for independent variables that have a loading
                      of 0.30 or higher one or more discriminant functions. A
                      variable can have a high loading on more than one function,
                      which complicates the interpretation. We will interpret the
                      variable for the function on which it has the highest loading.

                     In a stepwise discriminant analysis, we limit the interpretation
                      of relationships between independent variables and groups
                      defined by the dependent variable to those independent
                      variables that met the statistical test for inclusion in the
                      analysis.
SW388R7

                        Discriminant analysis and classification
Data Analysis &
 Computers II

   Slide 13



                     Discriminant analysis consists of two stages: in the first stage,
                      the discriminant functions are derived; in the second stage, the
                      discriminant functions are used to classify the cases.

                     While discriminant analysis does compute correlation measures
                      to estimate the strength of the relationship, these correlations
                      measure the relationship between the independent variables
                      and the discriminant scores.

                     A more useful measure to assess the utility of a discriminant
                      model is classification accuracy, which compares predicted
                      group membership based on the discriminant model to the
                      actual, known group membership which is the value for the
                      dependent variable.
SW388R7

                  Evaluating usefulness for discriminant models
Data Analysis &
 Computers II

   Slide 14




                     The benchmark that we will use to characterize a discriminant
                      model as useful is a 25% improvement over the rate of accuracy
                      achievable by chance alone.

                     Even if the independent variables had no relationship to the
                      groups defined by the dependent variable, we would still
                      expect to be correct in our predictions of group membership
                      some percentage of the time. This is referred to as by chance
                      accuracy.

                     The estimate of by chance accuracy that we will use is the
                      proportional by chance accuracy rate, computed by summing
                      the squared percentage of cases in each group.
SW388R7

                                  Comparing accuracy rates
Data Analysis &
 Computers II

   Slide 15


                     To characterize our model as useful, we compare the cross-
                      validated accuracy rate produced by SPSS to 25% more than the
                      proportional by chance accuracy.

                     The cross-validated accuracy rate is a one-at-a-time hold out
                      method that classifies each case based on a discriminant
                      solution for all of the other cases in the analysis. It is a more
                      realistic estimate of the accuracy rate we should expect in the
                      population because discriminant analysis inflates accuracy rates
                      when the cases classified are the same cases used to derive the
                      discriminant functions.

                     Cross-validated accuracy rates are not produced by SPSS when
                      separate covariance matrices are used in the classification,
                      which we address more next week.
SW388R7

                                Computing by chance accuracy
Data Analysis &
 Computers II

   Slide 16




                     The percentage of cases in each group defined by the
                      dependent variable are reported in the table "Prior
                      Probabilities for Groups"

                                 Prior Probabilities for Groups


                                                   Cases Used in Analysis
                        WELFARE          Prior     Unweighted   Weighted
                        1 TOO LITTLE        .406           56      56.000
                        2 ABOUT RIGHT       .362           50      50.000
                        3 TOO MUCH          .232           32      32.000
                        Total             1.000           138     138.000
                                                                  The proportional by chance accuracy
                                                                  rate was computed by squaring and
                                                                  summing the proportion of cases in
                                                                  each group from the table of prior
                                                                  probabilities for groups (0.406² +
                                                                  0.362² + 0.232² = 0.350).

                                                                  A 25% increase over this would require
                                                                  that our cross-validated accuracy be
                                                                  43.7% (1.25 x 35.0% = 43.7%).
SW388R7

                  Comparing the cross-validated accuracy rate
Data Analysis &
 Computers II

   Slide 17




                                                                                b,c
                                                           Classification Results

                                                                      Predicted Group Membership
                                                                  1 TOO      2 ABOUT
                                               WELFARE            LITTLE       RIGHT    3 TOO MUCH                   Total
                   Original            Count   1 TOO LITTLE             43           15             6                     64
                                               2 ABOUT RIGHT            26           30             6                     62
                                               3 TOO MUCH               17           10             9                     36
                                               Ungrouped cases           3            3             2                      8
                                       %       1 TOO LITTLE           67.2         23.4           9.4                  100.0
                                               2 ABOUT RIGHT          41.9         48.4           9.7                  100.0
                                               3 TOO MUCH             47.2         27.8          25.0                  100.0
                                               Ungrouped cases        37.5         37.5          25.0                  100.0
                   Cross-validated a   Count   1 TOO LITTLE             43           15             6                     64
                                                   SPSS reports the cross-validated accuracy rate
                                               2 ABOUT RIGHT
                                                   in the footnotes to the table "Classification 6
                                                                        26           30                                   62
                                               3 TOO MUCH The cross-validated accuracy rate 8
                                                   Results."            17           11                                   36
                                       %       1 TOO LITTLE by SPSS was 50.0% which was 9.4
                                                   computed           67.2         23.4                                100.0
                                                   greater than or equal to the proportional by
                                               2 ABOUT RIGHT          41.9         48.4           9.7                  100.0
                                                   chance accuracy criteria of 43.7%.
                                               3 TOO MUCH             47.2         30.6          22.2                  100.0
                     a. Cross validation is done only for those cases in the analysis. In cross validation, each case is
                        classified by the functions derived from all cases other than that case.
                     b. 50.6% of original grouped cases correctly classified.
                     c. 50.0% of cross-validated grouped cases correctly classified.
SW388R7

                                                         Problem 1
Data Analysis &
 Computers II

   Slide 18


                  1. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
                  application of a statistic? Assume that there is no problem with missing data, violation of
                  assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical
                  relationship.
                  The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"
                  [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated
                  movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen
                  an x-rated movie in the last year from survey respondents who had not seen an x-rated movie
                  in the last year.
                  Survey respondents who had seen an x-rated movie in the last year were younger than survey
                  respondents who had not seen an x-rated movie in the last year. Survey respondents who had
                  seen an x-rated movie in the last year were more likely to be male than survey respondents
                  who had not seen an x-rated movie in the last year.
                    1.   True
                    2.   True with caution
                    3.   False
                    4.   Inappropriate application of a statistic
SW388R7

                                           Dissecting problem 1 - 1
Data Analysis &
 Computers II

   Slide 19


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
                  application of a statistic? Assume that there is no problem with missing data, violation of
                  assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical
                  relationship.
                  The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"
                  [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated
                           For these problems, we will
                  movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen
                  an x-rated movie in the last no problemsurvey respondents who had not seen an x-rated movie
                           assume that there is
                                                 year from
                           with missing data, violation of
                  in the last year.
                           assumptions, or outliers.
                  Survey respondents whowe are told tox-rated movie in the last year were younger than survey
                          In this problem,
                                           had seen an
                  respondents0.05 as alpha for the x-rated movie in the last year. Survey respondents who had
                          use who had not seen an
                  seen an discriminant analysis. last year were more likely to be male than survey respondents
                          x-rated movie in the
                  who had not seen an x-rated movie in the last year.
                    1.   True
                    2.   True with caution
                    3.   False
                    4.   Inappropriate application of a statistic
SW388R7

                                          Dissecting problem 1 - 2
Data Analysis &
 Computers II

   Slide 20




                      The variables listed first in the problem
                      statement are the independent variables
                  1. In the dataset GSS2000.sav, is theof school statement true, false, or an incorrect
                      (IVs): "age" [age], "highest year following
                      completed" [educ], "sex" [sex], and
                  application of a statistic? Assume that there is no problem with missing data, violation of
                      "income" [rincom98].
                  assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical
                  relationship.
                  The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and
                  "income" [rincom98] are useful in distinguishing between groups based on responses to
                  "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents
                  who had seen an x-rated movie in the last year from survey respondents who had not seen an x-
                  rated movie in the last year.
                  Survey variable usedwho had seen an x-rated movie in the last year were younger than survey
                     The respondents to define
                  respondents the dependent
                     groups is who had not seen an x-rated movie in the last year. Survey respondents who had
                  seen an x-rated movie in the last year were more likely to be male than survey respondents
                     variable (DV): "seen x-rated
                     movie in last year" [xmovie].
                  who had not seen an x-rated movie in the last year.          When a problem states
                                                                                 that a list of independent
                                                                                 variables can distinguish
                                                                                 among groups, we do a
                                                                                 discriminant analysis
                                                                                 entering all of the
                                                                                 variables simultaneously.
SW388R7

                                           Dissecting problem 1 - 3
Data Analysis &
 Computers II

   Slide 21


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
                  of a statistic? Assume that there is no problem with missing data, violation of assumptions, or
                  outliers. Use a level of significance of 0.05 for evaluating the statistical relationship.
                  The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"
                  [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated
                  movie in last year" [xmovie]. These predictors differentiate survey respondents who had
                  seen an x-rated movie in the last year from survey respondents who had not seen an x-
                  rated movie in the last year.
                  Survey respondents who had seen an x-rated movie in the last year were younger than survey
                  respondents who had not seen an x-rated movie in the last year. Survey respondents who had
                  seen an x-rated movie in the last year were more the dependent
                               The problem identifies two groups for
                                                                     likely to be male than survey respondents
                  who had not seen an x-rated movie in the last year.
                               variable:
                                       •survey respondents who had seen an x-rated
                    1.   True          movie in the last year
                                       •survey respondents who had not seen an x-
                    2.   True with caution movie in the last year
                                       rated
                    3.   False
                    4.   Inappropriate application of a statistic the analysis will be
                                 To distinguish among two groups,
                                 required to find one statistically significant
                                 discriminant function.
SW388R7

                                           Dissecting problem 1 - 4
Data Analysis &
 Computers II

   Slide 22




                                                                 The specific relationships listed in the
                                                                 problem indicate how the independent
                  The variables "age" [age], "highest year of school completed" [educ], "sex"the
                                                                 variable relates to groups of [sex], and "income"
                  [rincom98] are useful in distinguishing between groups based on responsesmean for x-rated
                                                                 dependent variable, i.e., the to "seen
                  movie in last year" [xmovie]. These predictors age will be lower for respondents who had seen
                                                                  differentiate survey respondents who
                                                                 had seen an x-rated movie in the last
                  an x-rated movie in the last year from survey respondents who had not seen an x-rated movie
                                                                 year.
                  in the last year.
                  Survey respondents who had seen an x-rated movie in the last year were younger than
                  survey respondents who had not seen an x-rated movie in the last year. Survey respondents
                  who had seen an x-rated movie in the last year were more likely to be male than survey
                  respondents who had not seen an x-rated movie in the last year.
                    1.   True
                    2.   True with caution
                    3.   False                                In order for the discriminant analysis to be
                    4.   Inappropriate application of a statistic we must have enough statistically
                                                              true,
                                                              significant functions to distinguish among the
                                                              groups, the classification accuracy rate must
                                                              be substantially better than could be obtained
                                                              by chance alone, and each significant
                                                              relationship must be interpreted correctly.
SW388R7

                                     LEVEL OF MEASUREMENT - 1
Data Analysis &
 Computers II

   Slide 23


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
                  of a statistic? Assume that there is no problem with missing data, violation of assumptions, or
                  outliers. Use a level of significance of 0.05 for evaluating the statistical relationship.

                  The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"
                  [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated
                  movie in last year" [xmovie]. These predictors differentiate survey respondents who had
                  seen an x-rated movie in the last year from survey respondents who had not seen an x-
                  rated movie in the last year.

                  Survey respondents who had seen an x-rated movie in the last year were younger than survey
                  respondents who had not seen an x-rated movie in the last year. Survey respondents who had
                  seen an x-rated movie in the last year were more likely to be male than survey respondents
                  who had not seen an x-rated movie in the last year.
                                        Discriminant analysis requires that the
                                        dependent variable be non-metric and the
                    1. True             independent variables be metric or dichotomous.
                    2. True with caution"seen x-rated movie in last year" [xmovie] is an
                                        dichotomous variable, which satisfies the level of
                    3. False            measurement requirement.
                    4. Inappropriate application of a statistic
                                         It contains two categories: survey respondents
                                         who had seen an x-rated movie in the last year
                                         and survey respondents who had not seen an x-
                                         rated movie in the last year.
SW388R7

                                      LEVEL OF MEASUREMENT - 2
Data Analysis &
 Computers II

   Slide 24


                   In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
                   of a statistic? Assume that there is no problem with missing data, violation of assumptions, or
                   outliers. Use a level of significance of 0.05 for evaluating the statistical relationship.

                   The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and
                   "income" [rincom98] are useful in distinguishing between groups based on responses to "seen
                   x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who
                   had seen an x-rated movie in the last year from survey respondents who had not seen an x-
                   rated movie in the last year.

                   Survey respondents who had seen an x-rated movie in the last year were younger than survey
                   respondents who had not seen an x-rated movie in the last year. Survey respondents who had
                   seen an x-rated movie in the last year were more likely to be male than survey respondents
                  "Age" [age] and "highest year of
                  schoolhad not seen an x-rated movie in the last year.
                   who completed" [educ] are
                  interval level variables, which
                  satisfies the level of measurement
                      1. True
                  requirements for discriminant                      "Income" [rincom98] is an ordinal level
                  analysis.
                      2. True with caution                           variable. If we follow the convention of
                     3. False                                        treating ordinal level variables as metric
                                                                     variables, the level of measurement
                     4. Inappropriate application of a statistic     requirement for discriminant analysis is
                                                                     satisfied. Since some data analysts do
                                                                     not agree with this convention, a note
                   "Sex" [sex] is a dichotomous or                   of caution should be included in our
                   dummy-coded nominal variable                      interpretation.
                   which may be included in
                   discriminant analysis.
SW388R7

                  Request simultaneous discriminant analysis
Data Analysis &
 Computers II

   Slide 25




                                               Select the Classify |
                                               Discriminant… command
                                               from the Analyze menu.
SW388R7

                              Selecting the dependent variable
Data Analysis &
 Computers II

   Slide 26




                  First, highlight the
                  dependent variable
                  xmovie in the list
                  of variables.


                                                 Second, click on the right
                                                 arrow button to move the
                                                 dependent variable to the
                                                 Grouping Variable text box.
SW388R7

                     Defining the group values
Data Analysis &
 Computers II

   Slide 27




                  When SPSS moves the dependent variable to the
                  Grouping Variable textbox, it puts two question marks in
                  parentheses after the variable name. This is a reminder
                  that we have to enter the number that represent the
                  groups we want to include in the analysis.




                                First, to specify the
                                group numbers, click
                                on the Define Range…
                                button.
SW388R7

                          Completing the range of group values
Data Analysis &
 Computers II

   Slide 28




                  The value labels for xmovie show
                  two categories:
                        1 = YES
                        2 = NO
                                                                       First, type in 1 in
                  The range of values that we need
                                                                       the Minimum text
                  to enter goes from 1 as the
                                                                       box.
                  minimum and 2 as the maximum.




                                                     Second, type in
                                                     2 in the                        Third, click on the
                                                     Maximum text                    Continue button to
                                                     box.                            close the dialog box.
SW388R7

                  Selecting the independent variables
Data Analysis &
 Computers II

   Slide 29




                  Move the independent
                  variables listed in the
                  problem to the
                  Independents list box.
SW388R7

                  Specifying the method for including variables
Data Analysis &
 Computers II

   Slide 30



                                  SPSS provides us with two methods for including
                                  variables: to enter all of the independent variables
                                  at one time, and a stepwise method for selecting
                                  variables using a statistical test to determine the
                                  order in which variables are included.




                      Since the problem
                      states that there is a
                      relationship without
                      requesting the best
                      predictors, we accept
                      the default to Enter
                      independents together.
SW388R7

                  Requesting statistics for the output
Data Analysis &
 Computers II

   Slide 31




                                  Click on the Statistics…
                                  button to select statistics
                                  we will need for the
                                  analysis.
SW388R7

                                   Specifying statistical output
Data Analysis &
 Computers II

   Slide 32




                  First, mark the Means
                  checkbox on the Descriptives
                  panel. We will use the group
                  means in our interpretation.




                  Second, mark the Univariate
                  ANOVAs checkbox on the
                  Descriptives panel. Perusing
                  these tests suggests which
                  variables might be useful
                  descriminators.




                   Third, mark the Box’s M
                   checkbox. Box’s M statistic               Fourth, click on the
                   evaluates conformity to the               Continue button to
                   assumption of homogeneity of              close the dialog box.
                   group variances.
SW388R7

                  Specifying details for classification
Data Analysis &
 Computers II

   Slide 33




                           Click on the Classify…
                           button to specify details for
                           the classification phase of
                           the analysis.
SW388R7

                                           Details for classification - 1
Data Analysis &
 Computers II

   Slide 34




                                                    First, mark the option button to Compute from
                                                    group sizes on the Prior Probabilities panel.
                                                    This incorporates the size of the groups defined
                                                    by the dependent variable into the classification
                                                    of cases using the discriminant functions.

                  Second, mark the
                  Casewise results
                  checkbox on the
                  Display panel to
                  include
                  classification details
                  for each case in the
                  output.




                     Third, mark the Summary
                     table checkbox to include
                     summary tables
                     comparing actual and
                     predicted classification.
SW388R7

                  Details for classification - 2
Data Analysis &
 Computers II

   Slide 35




                        Fourth, mark the Leave-one-out
                        classification checkbox to request SPSS to
                        include a cross-validated classification in
                        the output. This option produces a less
                        biased estimate of classification accuracy
                        by sequentially holding each case out of
                        the calculations for the discriminant
                        functions, and using the derived functions
                        to classify the case held out.
SW388R7

                                        Details for classification - 3
Data Analysis &
 Computers II

   Slide 36




                  Fifth, accept the default of Within-groups                      Seventh, click
                  option button on the Use Covariance Matrix                      on the Continue
                  panel. The Covariance matrices are the                          button to close
                  measure of the dispersion in the groups                         the dialog box.
                  defined by the dependent variable. If we
                  fail the homogeneity of group variances
                  test (Box’s M), our option is use Separate
                  groups covariance in classification.
                                                               Sixth, mark the Combines-
                                                               groups checkbox on the Plots
                                                               panel to obtain a visual plot of
                                                               the relationship between
                                                               functions and groups defined
                                                               by the dependent variable.
SW388R7

                  Completing the discriminant analysis request
Data Analysis &
 Computers II

   Slide 37




                                                 Click on the OK
                                                 button to request the
                                                 output for the
                                                 disciminant analysis.
SW388R7

                  Sample size – ratio of cases to variables
Data Analysis &
 Computers II

   Slide 38



                               Analysis Case Processing Summary

                    Unweighted Cases                      N         Percent
                    Valid                                     119       44.1
                    Excluded Missing or out-of-range
                                                              49        18.1
                               group codes
                               At least one missing
                                                              66        24.4
                               discriminating variable
                               Both missing or
                               out-of-range group codes                    The minimum ratio of valid
                                                              36        13.3
                               and at least one missing                    cases to independent
                               discriminating variable                     variables for discriminant
                               Total                          151          analysis is 5 to 1, with a
                                                                        55.9
                    Total                                     270          preferred ratio of 20 to 1. In
                                                                       100.0
                                                                           this analysis, there are 119
                                                                           valid cases and 4 independent
                                                                           variables. The ratio of cases
                                                                           to independent variables is
                                                                           29.75 to 1, which satisfies the
                                                                           minimum requirement. In
                                                                           addition, the ratio of 29.75 to
                                                                           1 satisfies the preferred ratio
                                                                           of 20 to 1.
SW388R7

                        Sample size – minimum group size
Data Analysis &
 Computers II

   Slide 39



                           Prior Probabilities for Groups


                                         Cases Used in Analysis
                                                                  In addition to the requirement for the
                  XMOVIE       Prior     Unweighted   Weighted
                                                                  ratio of cases to independent
                  1               .311           37      37.000
                                                                  variables, discriminant analysis
                  2               .689           82      82.000   requires that there be a minimum
                  Total         1.000           119     119.000   number of cases in the smallest group
                                                                  defined by the dependent variable.
                                                                  The number of cases in the smallest
                                                                  group must be larger than the number
                                                                  of independent variables, and
                                                                  preferably contains 20 or more cases.

                                                                  The number of cases in the smallest
                                                                  group in this problem is 37, which is
                                                                  larger than the number of
                                                                  independent variables (4), satisfying
                                                                  the minimum requirement. In
                                                                  addition, the number of cases in the
                                                                  smallest group satisfies the preferred
                                                                  minimum of 20 cases.
                   If the sample size did not
                   initially satisfy the minimum
                   requirements, discriminant
                   analysis is not appropriate.
SW388R7

                  NUMBER OF DISCRIMINANT FUNCTIONS - 1
Data Analysis &
 Computers II

   Slide 40




                               The maximum possible number of discriminant
                               functions is the smaller of one less than the
                               number of groups defined by the dependent
                               variable and the number of independent
                               variables.

                               In this analysis there were 2 groups defined by
                               seen x-rated movie in last year and 4
                               independent variables, so the maximum
                               possible number of discriminant functions was
                               1.
SW388R7

                  NUMBER OF DISCRIMINANT FUNCTIONS - 2
Data Analysis &
 Computers II

   Slide 41




                                        In the table of Wilks' Lambda
                                        which tested functions for
                                        statistical significance, the direct
                                        analysis identified 1 discriminant
                                        functions that were statistically
                                        significant. The Wilks' lambda
                                        statistic for the test of function 1
                                        (chi-square=24.159) had a
                                        probability of <0.001 which was
                                        less than or equal to the level
                                        of significance of 0.05. The
                                        significance of the maximum
                                        possible number of discriminant
                                        functions supports the
                                        interpretation of a solution using
                                        1 discriminant function.
SW388R7
Data Analysis &   Independent variables and group membership:
                       relationship of functions to groups
 Computers II

   Slide 42




                    In order to specify the role that each independent
                    variable plays in predicting group membership on the
                    dependent variable, we must link together the
                    relationship between the discriminant functions and the
                    groups defined by the dependent variable, the role of
                    the significant independent variables in the
                    discriminant functions, and the differences in group
                    means for each of the variables.




                                                                  Each function divides the groups into two
                                                                  subgroups by assigning negative values to
                                                                  one subgroup and positive values to the
                                                                  other subgroup. Function 1 separates
                                                                  survey respondents who had seen an x-
                                                                  rated movie in the last year (-.714) from
                                                                  survey respondents who had not seen an
                             Functions at Group Centroids         x-rated movie in the last year (.322).
                                         Function
                              XMOVIE        1
                              1             -.714
                              2               .322
                              Unstandardized canonical discriminant
                              functions evaluated at group means
SW388R7
Data Analysis &   Independent variables and group membership:
                         predictor loadings on functions
 Computers II

   Slide 43



                   We do not
                   interpret loadings
                   in the structure                              Based on the structure matrix, the
                   matrix unless they                            predictor variables strongly associated with
                   are 0.30 or                                   discriminant function 1 which distinguished
                   higher.                                       between survey respondents who had seen
                                                                 an x-rated movie in the last year and
                                                                 survey respondents who had not seen an
                                                                 x-rated movie in the last year were age
                                                                 (r=0.467) and sex (r=0.770).


                                    Structure Matrix

                                                Function
                                                   1
                                 SEX                 .770
                                 AGE                 .467
                                 EDUC                .118
                                 RINCOM98            .044
                                 Pooled within-groups correlations between discriminating
                                 variables and standardized canonical discriminant functions
                                 Variables ordered by absolute size of correlation within function.
SW388R7
Data Analysis &   Independent variables and group membership:
                    predictors associated with first function - 1
 Computers II

   Slide 44




                                        Group Statistics

                                                                   Valid N (listwise)
                  XMOVIE              Mean     Std. Deviation   Unweighted
                                                                         The Weighted age for survey
                                                                                average
                  1        AGE         37.24          10.838             37        37.000
                                                                         respondents who had seen an
                           EDUC        13.86           2.720             x-rated movie in the last year
                                                                         37        37.000
                           SEX          1.27            .450             (mean=37.24) was lower than the
                                                                         37        37.000
                                                                         average age for survey
                           RINCOM98    13.76           5.209             37        37.000
                                                                         respondents who had not seen an
                  2        AGE         42.70          11.461             x-rated movie in the last year
                                                                         82        82.000
                           EDUC        14.18           2.534             (mean=42.70).
                                                                         82        82.000
                           SEX          1.65            .481             82        82.000
                                                                         This supports the relationship that
                           RINCOM98    14.00           5.308             82        82.000
                                                                         "survey respondents who had seen
                  Total    AGE         41.00          11.508             an x-rated movie in the last year
                                                                       119       119.000
                           EDUC        14.08           2.586             were younger than survey
                                                                       119       119.000
                                                                         respondents who had not seen an
                           SEX          1.53            .501           119       119.000
                                                                         x-rated movie in the last year."
                           RINCOM98    13.92           5.256           119       119.000
SW388R7
Data Analysis &       Independent variables and group membership:
                        predictors associated with first function - 2
 Computers II

   Slide 45



                                        Group Statistics

                                                                   Valid N (listwise)
                  XMOVIE              Mean     Std. Deviation   Unweighted      Weighted
                  1        AGE                                             Since sex is a dichotomous variable,
                                       37.24          10.838             37        37.000
                                                                           the mean is not directly interpretable.
                           EDUC        13.86           2.720             37Its interpretation must take into
                                                                                   37.000
                           SEX          1.27            .450             37account the coding by which 1
                                                                                   37.000
                           RINCOM98    13.76           5.209             37corresponds to male and 2
                                                                                   37.000
                                                                           corresponds to female. The lower
                  2        AGE         42.70          11.461             82mean for survey respondents who
                                                                                   82.000
                           EDUC        14.18           2.534             82had seen an x-rated movie in the last
                                                                                   82.000
                           SEX          1.65            .481             82year 82.000
                                                                                   (mean=1.27), when compared
                                                                           to the mean for survey respondents
                           RINCOM98    14.00           5.308             82who had not seen an x-rated movie in
                                                                                   82.000
                  Total    AGE         41.00          11.508           119the last year (mean=1.65), implies
                                                                                 119.000
                           EDUC        14.08           2.586           119 that 119.000
                                                                                  the group contained more survey
                                                                           respondents who were male and
                           SEX          1.53            .501           119fewer survey respondents who were
                                                                                 119.000
                           RINCOM98    13.92           5.256           119female.119.000

                                                                         This supports the relationship that
                                                                         "survey respondents who had seen
                                                                         an x-rated movie in the last year
                                                                         were more likely to be male than
                                                                         survey respondents who had not seen
                                                                         an x-rated movie in the last year."
SW388R7
              CLASSIFICATION USING THE DISCRIMINANT MODEL:
Data Analysis &

                          by chance accuracy rate
 Computers II

   Slide 46




                    The independent variables could be characterized as useful
                    predictors of membership in the groups defined by the
                    dependent variable if the cross-validated classification
                    accuracy rate was significantly higher than the accuracy
                    attainable by chance alone. Operationally, the cross-validated
                    classfication accuracy rate should be 25% or more higher
                    than the proportional by chance accuracy rate.

                    The proportional by chance accuracy rate was computed by
                    squaring and summing the proportion of cases in each group
                    from the table of prior probabilities for groups (0.311² +
                    0.689² = 0.571).




                                       Prior Probabilities for Groups


                                                     Cases Used in Analysis
                              XMOVIE       Prior     Unweighted   Weighted
                              1               .311           37      37.000
                              2               .689           82      82.000
                              Total         1.000           119     119.000
SW388R7
              CLASSIFICATION USING THE DISCRIMINANT MODEL:
Data Analysis &

                     criteria for classification accuracy
 Computers II

   Slide 47

                                                                      b,c
                                                 Classification Results

                                                                       Predicted Group
                                                                         Membership
                                              XMOVIE                    1           2             Total
                  Original            Count   1                             15          22             37
                                              2                             12          70             82
                                              Ungrouped cases               13          36             49
                                      %       1                           40.5        59.5          100.0
                                              2                           14.6        85.4          100.0
                                              Ungrouped cases             26.5        73.5          100.0
                  Cross-validated a   Count   1                             15          22             37
                                              2                             12          70             82
                                      %       1                           40.5        59.5          100.0
                                              2                           14.6        85.4          100.0
                    a. Cross validation is done only for those cases in the analysis. In cross
                       validation, each case is classified by the functions derived from all cases other
                       than that case.
                    b. 71.4% of original grouped cases correctly classified.
                    c. 71.4% of cross-validated grouped cases correctly classified.


                                              The cross-validated accuracy rate computed by SPSS
                                              was 71.4% which was greater than or equal to the
                                              proportional by chance accuracy criteria of 71.4%
                                              (1.25 x 57.1% = 71.4%).

                                              The criteria for classification accuracy is satisfied.
SW388R7

                          Answering the question in problem 1 - 1
Data Analysis &
 Computers II

   Slide 48


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
                  of a statistic? Assume that there is no problem with missing data, violation of assumptions, or
                  outliers. Use a level of significance of 0.05 for evaluating the statistical relationship.
                  The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"
                  [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated
                  movie in last year" [xmovie]. These predictors differentiate survey respondents who had
                  seen an x-rated movie in the last year from survey respondents who had not seen an x-
                  rated movie in the last year.
                  Survey respondents who had seen an x-rated movie in the last year were younger than survey
                  respondents who had not seen an x-rated movie in the last year. Survey respondents who had
                  seen an x-rated movie in the last year were more likely to be male than survey respondents
                  who had not seen an x-rated movie in the last year.
                                    We found one statistically significant
                    1.   True
                                    discriminant function, making it possible to
                    2.   True with caution
                                    distinguish among the two groups defined by
                    3.   False      the dependent variable.
                    4.   Inappropriate applicationcross-validated classification
                                    Moreover, the of a statistic
                                    accuracy surpassed the by chance accuracy
                                    criteria, supporting the utility of the model.
SW388R7

                          Answering the question in problem 1 - 2
Data Analysis &
 Computers II

   Slide 49


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
                  of a statistic? Assume that there is no problem with missing data, violation of assumptions, or
                  outliers. Use a level of significance of 0.05 for evaluating the statistical relationship.
                  The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income"
                  [rincom98] are useful in distinguishing between that each statement
                                                       We verified groups based on responses to "seen x-rated
                  movie in last year" [xmovie]. These predictorsrelationship between respondents who had seen
                                                       about the differentiate survey
                  an x-rated movie in the last year from survey respondentswas correct. seen an x-rated movie
                                                       predictors and groups who had not
                  in the last year.
                  Survey respondents who had seen an x-rated movie in the last year were younger than
                  survey respondents who had not seen an x-rated movie in the last year. Survey respondents
                  who had seen an x-rated movie in the last year were more likely to be male than survey
                  respondents who had not seen an x-rated movie in the last year.
                    1.   True
                    2.   True with caution
                    3.   False
                    4.   Inappropriate application of a statistic
                                                                       The answer to the question is true
                                                                       with caution.

                                                                       A caution is added because of the
                                                                       inclusion of ordinal level variables.
SW388R7

                                                           Problem 2
Data Analysis &
 Computers II

   Slide 50


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a
                  statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers.
                  Use a level of significance of 0.05 for evaluating the statistical relationship.
                  From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency of
                  prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful
                  predictor for distinguishing between groups based on responses to "attitude toward abortion when
                  there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer" [pray].
                  These predictors differentiate survey respondents who thought it should be possible for a woman
                  to obtain a legal abortion if there is a strong chance of a serious defect in the baby from survey
                  respondents who didn't think it should be possible for a woman to obtain a legal abortion if there
                  is a strong chance of a serious defect in the baby.
                  The most important predictor of groups based on responses to attitude toward abortion when
                  there is a strong chance of serious defect in the baby was frequency of prayer.
                  Survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if
                  there is a strong chance of a serious defect in the baby prayed more often than survey
                  respondents who thought it should be possible for a woman to obtain a legal abortion if there is a
                  strong chance of a serious defect in the baby.
                    1.   True
                    2.   True with caution
                    3.   False
                    4.   Inappropriate application of a statistic
SW388R7

                                       Dissecting problem 2 - 1
Data Analysis &
 Computers II

   Slide 51




                         The variables listed first in the problem
                         statement are the independent variables
                         (IVs): "respondent's degree of religious
                         fundamentalism" [fund], "frequency of
                         prayer" [pray], and "frequency of
                         attendance at religious services" [attend].
              In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a
              statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers.
              Use a level of significance of 0.05 for evaluating the statistical relationship.
              From the list of variables "respondent's degree of religious fundamentalism" [fund],
              "frequency of prayer" [pray], and "frequency of attendance at religious services" [attend], the
              most useful predictor for distinguishing between groups based on responses to "attitude toward
              abortion when there is a strong chance of serious defect in the baby" [abdefect] is "frequency
              of prayer" [pray]. These predictors differentiate survey respondents who thought it should be
              possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in
              the baby from survey respondents who didn't think it should be possible for a woman to obtain a
              legal abortion if there is a strong chance of a serious defect in the baby.
                                                                         The variable used to define
              The most important predictor of groups based on responses to attitude dependent
                                                                         groups is the toward abortion when
              there is a strong a problem asks us defect in the baby was variable (DV):prayer. toward
                         When chance of serious                          frequency of "attitude
                        to identify the best or                          abortion when there is a
                        most useful predictors                           strong chance of serious
                        from a list of                                   defect in the baby" [abdefect]
                        independent variables,
                        we do stepwise
                        discriminant analysis.
SW388R7

                                            Dissecting problem 2 - 2
Data Analysis &
 Computers II

   Slide 52




                          The problem identifies two groups for the dependent variable:
                               •survey respondents who thought it should be possible for a woman
                               to obtain a legal abortion if there is a strong chance of a serious
                               defect in the baby
                               •survey respondents who didn't think it should be possible for a
                               woman to obtain a legal abortion if there is a strong chance of a
                               serious defect in the baby.
                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a
                  statistic? Assume that amongis no groups, the analysis will be required toof assumptions, or outliers.
                           To distinguish
                                          there two problem with missing data, violation find one
                  Use a level of significance of 0.05 for evaluating the statistical relationship.
                          statistically significant discriminant functions.
                  From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency of
                  prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful
                  predictor for distinguishing between groups based on responses to "attitude toward abortion when
                  there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer" [pray].
                  These predictors differentiate survey respondents who thought it should be possible for a
                  woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby
                  from survey respondents who didn't think it should be possible for a woman to obtain a legal
                  abortion if there is a strong chance of a serious defect in the baby.
                  The most important predictor of groups based on responses to attitude toward abortion when
                  there is a strong chance of serious defect in the baby was frequency of prayer.

                                                    The importance of predictors is
                                                    based upon the stepwise
                                                    addition of variables to the
                                                    analysis.
SW388R7

                                         Dissecting problem 2 - 3
Data Analysis &
 Computers II

   Slide 53




                  From the list of variables "respondent'slisted in thereligious fundamentalism" [fund], "frequency of
                                 The specific relationships degree of problem indicate how the
                  prayer" [pray], and "frequency of attendancegroups of theservices" [attend], the most useful
                                 independent variable relates to at religious dependent variable, i.e.,
                  predictor for distinguishing frequency groups based be lower for respondents who
                                 the mean for between of prayer will on responses to "attitude toward abortion when
                  there is a strong chance should be possible in thewoman [abdefect] is "frequency of prayer" [pray].
                                 thought it of serious defect for a baby" to obtain a legal abortion if
                  These predictors differentiate chance respondentsdefectthought it should be possible for a woman
                                 there is a strong survey of a serious who in the baby compared to
                  to obtain a legal abortion if therewho didn't think it should be possible forin the baby from survey
                                 survey respondents is a strong chance of a serious defect a
                  respondents who didn't think itashouldabortion if there is woman to obtain aa
                                 woman to obtain legal be possible for a a strong chance of legal abortion if there
                  is a strong chance of a serious the baby. the baby.
                                 serious defect in defect in
                  The most important predictor of groups based on responses to attitude toward abortion when
                  there is a strong chance of serious defect in the baby was frequency of prayer.
                  Survey respondents who didn't think it should be possible for a woman to obtain a legal
                  abortion if there is a strong chance of a serious defect in the baby prayed more often than
                  survey respondents who thought it should be possible for a woman to obtain a legal abortion if
                  there is a strong chance of a serious defect in the baby.
                   1. True
                   2. True with caution
              In a 3. False analysis, we only
                   stepwise                                        In order for a stepwise analysis to be
              interpret the independent
                   4. Inappropriate application of a statistic     true, we must have enough statistically
              variables that are entered in                        significant functions to distinguish among
              the stepwise analysis.                               the groups, the order of entry must be
                                                                   correct, and each significant relationship
                                                                   must be interpreted correctly.
SW388R7

                                     LEVEL OF MEASUREMENT - 1
Data Analysis &
 Computers II

   Slide 54


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
                  of a statistic? Assume that there is no problem with missing data, violation of assumptions, or
                  outliers. Use a level of significance of 0.05 for evaluating the statistical relationship.
                  From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency
                  of prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful
                  predictor for distinguishing between groups based on responses to "attitude toward abortion
                  when there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer"
                  [pray]. These predictors differentiate survey respondents who thought it should be possible for
                  a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby
                  from survey respondents who didn't think it should be possible for a woman to obtain a legal
                  abortion if there is a strong chance of a serious defect in the baby.
                  The most important predictor of groups based on responses to attitude toward abortion when
                  there is a strong chance of serious defect in the baby was frequency of prayer.
                  Survey respondents who didn't think it should be possible for a woman to obtain a legal
                  abortion if there is a strong chance of a serious defect in the baby prayed more often than
                  survey respondents who thought it should be possible for a woman to obtain a legal abortion if
                  there is a strong chance of a serious defect requires that the
                                           Discriminant analysis in the baby.
                                         dependent variable be non-metric and the
                                         independent variables be metric or dichotomous.

                                         "Attitude toward abortion when there is a strong
                                         chance of serious defect in the baby" [abdefect]
                                         is a nominal level variable, which satisfies the
                                         level of measurement requirement.
SW388R7

                                     LEVEL OF MEASUREMENT - 2
Data Analysis &
 Computers II

   Slide 55


                  In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application
                  of a statistic? Assume that there is no problem with missing data, violation of assumptions, or
                  outliers. Use a level of significance of 0.05 for evaluating the statistical relationship.
                  From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency
                  of prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful
                  predictor for distinguishing between groups based on responses to "attitude toward abortion
                  when there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer"
                  [pray]. These predictors differentiate survey respondents who thought it should be possible for
                  a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby
                  from survey respondents who didn't think it should be possible for a woman to obtain a legal
                  abortion if there is a strong chance of a serious defect in the baby.
                  The most important predictor of groups based on responses to attitude toward abortion when
                  there is a strong chance of serious defect in the baby was frequency of prayer.
                                                                     "Respondent's degree of religious
                  Survey respondents who didn't think it should be possible for a woman to obtain a legal
                  abortion if there is a strong chance of a serious defect in the baby [fund], "frequency ofthan
                                                                     fundamentalism" prayed more often
                  survey respondents who thought it should be possible for [pray], and "frequency of abortion if
                                                                     prayer" a woman to obtain a legal
                  there is a strong chance of a serious defect in theattendance at religious services"
                                                                      baby.
                                                                     [attend] are ordinal level variables. If
                                                                    we follow the convention of treating
                                                                    ordinal level variables as metric
                                                                    variables, the level of measurement
                                                                    requirement for discriminant analysis is
                                                                    satisfied. Since some data analysts do
                                                                    not agree with this convention, a note
                                                                    of caution should be included in our
                                                                    interpretation.
SW388R7

                  Request stepwise discriminant analysis
Data Analysis &
 Computers II

   Slide 56




                                             Select the Classify |
                                             Discriminant… command
                                             from the Analyze menu.
SW388R7

                            Selecting the dependent variable
Data Analysis &
 Computers II

   Slide 57




                  First, highlight the
                  dependent variable
                  abdefect in the list
                  of variables.                Second, click on the right
                                               arrow button to move the
                                               dependent variable to the
                                               Grouping Variable text box.
SW388R7

                     Defining the group values
Data Analysis &
 Computers II

   Slide 58




                  When SPSS moves the dependent variable to the
                  Grouping Variable textbox, it puts two question marks in
                  parentheses after the variable name. This is a reminder
                  that we have to enter the number that represent the
                  groups we want to include in the analysis.




                                First, to specify the
                                group numbers, click
                                on the Define Range…
                                button.
SW388R7

                          Completing the range of group values
Data Analysis &
 Computers II

   Slide 59




                  The value labels for abdefect
                  show two categories:
                       1 = YES
                       2 = NO
                                                                    First, type in 1 in
                  The range of values that we need
                                                                    the Minimum text
                  to enter goes from 1 as the
                                                                    box.
                  minimum and 2 as the maximum.




                                                  Second, type in
                                                  2 in the                        Third, click on the
                                                  Maximum text                    Continue button to
                                                  box.                            close the dialog box.
SW388R7

                      Selecting the independent variables
Data Analysis &
 Computers II

   Slide 60




                  Move the independent
                  variables listed in the
                  problem to the
                  Independents list box.
SW388R7

                  Specifying the method for including variables
Data Analysis &
 Computers II

   Slide 61



                                     SPSS provides us with two methods for including
                                     variables: to enter all of the independent variables
                                     at one time, and a stepwise method for selecting
                                     variables using a statistical test to determine the
                                     order in which variables are included.




                      Since the problem calls
                      for identifying the best
                      predictors, we click on
                      the option button to
                      Use stepwise method.
SW388R7

                  Requesting statistics for the output
Data Analysis &
 Computers II

   Slide 62




                                  Click on the Statistics…
                                  button to select statistics
                                  we will need for the
                                  analysis.
SW388R7

                                   Specifying statistical output
Data Analysis &
 Computers II

   Slide 63




                  First, mark the Means
                  checkbox on the Descriptives
                  panel. We will use the group
                  means in our interpretation.




                  Second, mark the Univariate
                  ANOVAs checkbox on the
                  Descriptives panel. Perusing
                  these tests suggests which
                  variables might be useful
                  descriminators.




                   Third, mark the Box’s M
                   checkbox. Box’s M statistic               Fourth, click on the
                   evaluates conformity to the               Continue button to
                   assumption of homogeneity of              close the dialog box.
                   group variances.
SW388R7

                  Specifying details for the stepwise method
Data Analysis &
 Computers II

   Slide 64




                                           Click on the Method…
                                           button to specify the
                                           specific statistical criteria to
                                           use for including variables.
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships
Discriminant analysis basicrelationships

Más contenido relacionado

La actualidad más candente

Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisWansuklangk
 
Discriminant Analysis in Sports
Discriminant Analysis in SportsDiscriminant Analysis in Sports
Discriminant Analysis in SportsJ P Verma
 
T18 discriminant analysis
T18 discriminant analysisT18 discriminant analysis
T18 discriminant analysiskompellark
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spssDr Nisha Arora
 
Testing Assumptions in repeated Measures Design using SPSS
Testing Assumptions in repeated Measures Design using SPSSTesting Assumptions in repeated Measures Design using SPSS
Testing Assumptions in repeated Measures Design using SPSSJ P Verma
 
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ijcsit
 
Multivariate Analysis An Overview
Multivariate Analysis An OverviewMultivariate Analysis An Overview
Multivariate Analysis An Overviewguest3311ed
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
IRJET- Ordinal based Classification Techniques: A Survey
IRJET-  	  Ordinal based Classification Techniques: A SurveyIRJET-  	  Ordinal based Classification Techniques: A Survey
IRJET- Ordinal based Classification Techniques: A SurveyIRJET Journal
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsAlexander Decker
 
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan YavarMultivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan YavarBijan Yavar
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksEditor IJCATR
 

La actualidad más candente (18)

Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Discriminant Analysis in Sports
Discriminant Analysis in SportsDiscriminant Analysis in Sports
Discriminant Analysis in Sports
 
Discriminant Analysis
Discriminant AnalysisDiscriminant Analysis
Discriminant Analysis
 
T18 discriminant analysis
T18 discriminant analysisT18 discriminant analysis
T18 discriminant analysis
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
 
Testing Assumptions in repeated Measures Design using SPSS
Testing Assumptions in repeated Measures Design using SPSSTesting Assumptions in repeated Measures Design using SPSS
Testing Assumptions in repeated Measures Design using SPSS
 
Malhotra08
Malhotra08Malhotra08
Malhotra08
 
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
ON FEATURE SELECTION ALGORITHMS AND FEATURE SELECTION STABILITY MEASURES: A C...
 
Multivariate Analysis An Overview
Multivariate Analysis An OverviewMultivariate Analysis An Overview
Multivariate Analysis An Overview
 
Malhotra09
Malhotra09Malhotra09
Malhotra09
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Malhotra16
Malhotra16Malhotra16
Malhotra16
 
IRJET- Ordinal based Classification Techniques: A Survey
IRJET-  	  Ordinal based Classification Techniques: A SurveyIRJET-  	  Ordinal based Classification Techniques: A Survey
IRJET- Ordinal based Classification Techniques: A Survey
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data sets
 
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan YavarMultivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
Multivariate Analaysis of Variance (MANOVA): Sharma, Chapter 11 - Bijan Yavar
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Destacado

Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in RDuyen Do
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisMurali Raj
 
Machine learning
Machine learningMachine learning
Machine learningShreyas G S
 
Lets eat presentation_final_20160521
Lets eat presentation_final_20160521Lets eat presentation_final_20160521
Lets eat presentation_final_20160521Lesley Chapman
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation Todd Rutherford
 
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...Brittne Kakulla, Ph.D.
 
Analysis of differential investor performance captstone presentation final
Analysis of differential investor  performance   captstone  presentation finalAnalysis of differential investor  performance   captstone  presentation final
Analysis of differential investor performance captstone presentation finalHoward Ho
 
Capital Bikeshare Presentation
Capital Bikeshare PresentationCapital Bikeshare Presentation
Capital Bikeshare Presentationdonahuerm
 
Probabilistic generative models for machine vision
Probabilistic generative models for machine visionProbabilistic generative models for machine vision
Probabilistic generative models for machine visionzukun
 
Georgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectGeorgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectMark Phillips
 
Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Noah Turner
 
4.7 The Quadratic Formula
4.7 The Quadratic Formula4.7 The Quadratic Formula
4.7 The Quadratic Formulaleblance
 
Tutorials--Dividing Fractions
Tutorials--Dividing FractionsTutorials--Dividing Fractions
Tutorials--Dividing FractionsMedia4math
 
Tutorials: The Discriminant
Tutorials: The DiscriminantTutorials: The Discriminant
Tutorials: The DiscriminantMedia4math
 

Destacado (20)

Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Machine learning
Machine learningMachine learning
Machine learning
 
Red Blue Presentation
Red Blue PresentationRed Blue Presentation
Red Blue Presentation
 
Lets eat presentation_final_20160521
Lets eat presentation_final_20160521Lets eat presentation_final_20160521
Lets eat presentation_final_20160521
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation
 
Personalizing a Stream of Content
Personalizing a Stream of ContentPersonalizing a Stream of Content
Personalizing a Stream of Content
 
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
No More Half Fast: Improving US Broadband Download Speed. Georgetown Universi...
 
Analysis of differential investor performance captstone presentation final
Analysis of differential investor  performance   captstone  presentation finalAnalysis of differential investor  performance   captstone  presentation final
Analysis of differential investor performance captstone presentation final
 
Capital Bikeshare Presentation
Capital Bikeshare PresentationCapital Bikeshare Presentation
Capital Bikeshare Presentation
 
Probabilistic generative models for machine vision
Probabilistic generative models for machine visionProbabilistic generative models for machine vision
Probabilistic generative models for machine vision
 
Machine learning
Machine learningMachine learning
Machine learning
 
Georgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone ProjectGeorgetown Data Analytics - Team 1 Capstone Project
Georgetown Data Analytics - Team 1 Capstone Project
 
Nfb What Why How
Nfb What Why HowNfb What Why How
Nfb What Why How
 
Math 9 (module 1)
Math 9 (module 1)Math 9 (module 1)
Math 9 (module 1)
 
Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)
 
4.7 The Quadratic Formula
4.7 The Quadratic Formula4.7 The Quadratic Formula
4.7 The Quadratic Formula
 
Tutorials--Dividing Fractions
Tutorials--Dividing FractionsTutorials--Dividing Fractions
Tutorials--Dividing Fractions
 
Tutorials: The Discriminant
Tutorials: The DiscriminantTutorials: The Discriminant
Tutorials: The Discriminant
 

Similar a Discriminant analysis basicrelationships

discriminantfunctionanalysisdfa-200926121304(1).pptx
discriminantfunctionanalysisdfa-200926121304(1).pptxdiscriminantfunctionanalysisdfa-200926121304(1).pptx
discriminantfunctionanalysisdfa-200926121304(1).pptxADVENTUREARASAN
 
discriminantfunctionanalysisdfa-200926121304.pptx
discriminantfunctionanalysisdfa-200926121304.pptxdiscriminantfunctionanalysisdfa-200926121304.pptx
discriminantfunctionanalysisdfa-200926121304.pptxADVENTUREARASAN
 
discriminant analysis.pdf
discriminant analysis.pdfdiscriminant analysis.pdf
discriminant analysis.pdfYashwanth Rm
 
diiscriminant analysis1.pptx
diiscriminant analysis1.pptxdiiscriminant analysis1.pptx
diiscriminant analysis1.pptxSharumathiR1
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsAnirudha si
 
Discriminant analysis group no. 4
Discriminant analysis  group no. 4Discriminant analysis  group no. 4
Discriminant analysis group no. 4Advait Bhobe
 
An Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisAn Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisIOSR Journals
 
Technical Note on DBMS
Technical Note on DBMSTechnical Note on DBMS
Technical Note on DBMSKr Shrishant
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...AIRCC Publishing Corporation
 
On Feature Selection Algorithms and Feature Selection Stability Measures : A...
 On Feature Selection Algorithms and Feature Selection Stability Measures : A... On Feature Selection Algorithms and Feature Selection Stability Measures : A...
On Feature Selection Algorithms and Feature Selection Stability Measures : A...AIRCC Publishing Corporation
 
Discriminate analysis
Discriminate analysisDiscriminate analysis
Discriminate analysisRaja Adapa
 

Similar a Discriminant analysis basicrelationships (20)

R studio.pptx
R studio.pptxR studio.pptx
R studio.pptx
 
discriminantfunctionanalysisdfa-200926121304(1).pptx
discriminantfunctionanalysisdfa-200926121304(1).pptxdiscriminantfunctionanalysisdfa-200926121304(1).pptx
discriminantfunctionanalysisdfa-200926121304(1).pptx
 
discriminantfunctionanalysisdfa-200926121304.pptx
discriminantfunctionanalysisdfa-200926121304.pptxdiscriminantfunctionanalysisdfa-200926121304.pptx
discriminantfunctionanalysisdfa-200926121304.pptx
 
discriminant analysis.pdf
discriminant analysis.pdfdiscriminant analysis.pdf
discriminant analysis.pdf
 
Malhotra18
Malhotra18Malhotra18
Malhotra18
 
diiscriminant analysis1.pptx
diiscriminant analysis1.pptxdiiscriminant analysis1.pptx
diiscriminant analysis1.pptx
 
Kinds Of Variable
Kinds Of VariableKinds Of Variable
Kinds Of Variable
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Discriminant analysis.pptx
Discriminant analysis.pptxDiscriminant analysis.pptx
Discriminant analysis.pptx
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Discriminant analysis group no. 4
Discriminant analysis  group no. 4Discriminant analysis  group no. 4
Discriminant analysis group no. 4
 
An Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisAn Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data Analysis
 
DBMS4.ppt
DBMS4.pptDBMS4.ppt
DBMS4.ppt
 
Dbms basic nots
Dbms basic notsDbms basic nots
Dbms basic nots
 
Technical Note on DBMS
Technical Note on DBMSTechnical Note on DBMS
Technical Note on DBMS
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
On Feature Selection Algorithms and Feature Selection Stability Measures : A ...
 
On Feature Selection Algorithms and Feature Selection Stability Measures : A...
 On Feature Selection Algorithms and Feature Selection Stability Measures : A... On Feature Selection Algorithms and Feature Selection Stability Measures : A...
On Feature Selection Algorithms and Feature Selection Stability Measures : A...
 
Discriminate analysis
Discriminate analysisDiscriminate analysis
Discriminate analysis
 

Discriminant analysis basicrelationships

  • 1. SW388R7 Discriminant Analysis – Basic Relationships Data Analysis & Computers II Slide 1 Discriminant Functions and Scores Describing Relationships Classification Accuracy Sample Problems
  • 2. SW388R7 Discriminant analysis Data Analysis & Computers II Slide 2  Discriminant analysis is used to analyze relationships between a non-metric dependent variable and metric or dichotomous independent variables.  Discriminant analysis attempts to use the independent variables to distinguish among the groups or categories of the dependent variable.  The usefulness of a discriminant model is based upon its accuracy rate, or ability to predict the known group memberships in the categories of the dependent variable.
  • 3. SW388R7 Discriminant scores Data Analysis & Computers II Slide 3  Discriminant analysis works by creating a new variable called the discriminant function score which is used to predict to which group a case belongs.  Discriminant function scores are computed similarly to factor scores, i.e. using eigenvalues. The computations find the coefficients for the independent variables that maximize the measure of distance between the groups defined by the dependent variable.  The discriminant function is similar to a regression equation in which the independent variables are multiplied by coefficients and summed to produce a score.
  • 4. SW388R7 Discriminant functions Data Analysis & Computers II Slide 4  Conceptually, we can think of the discriminant function or equation as defining the boundary between groups.  Discriminant scores are standardized, so that if the score falls on one side of the boundary (standard score less than zero, the case is predicted to be a member of one group) and if the score falls on the other side of the boundary (positive standard score), it is predicted to be a member of the other group.
  • 5. SW388R7 Number of functions Data Analysis & Computers II Slide 5  If the dependent variable defines two groups, one statistically significant discriminant function is required to distinguish the groups; if the dependent variable defines three groups, two statistically significant discriminant functions are required to distinguish among the three groups; etc.  If a discriminant function is able to distinguish among groups, it must have a strong relationship to at least one of the independent variables.  The number of possible discriminant functions in an analysis is limited to the smaller of the number of independent variables or one less than the number of groups defined by the dependent variable.
  • 6. SW388R7 Overall test of relationship Data Analysis & Computers II Slide 6  The overall test of relationship among the independent variables and groups defined by the dependent variable is a series of tests that each of the functions needed to distinguish among the groups is statistically significant.  In some analyses, we might discover that two or more of the groups defined by the dependent variable cannot be distinguished using the available independent variables. While it is reasonable to interpret a solution in which there are fewer significant discriminant functions than the maximum number possible, our problems will require that all of the possible discriminant functions be significant.
  • 7. SW388R7 Data Analysis & Interpreting the relationship between independent and dependent variables Computers II Slide 7  The interpretative statement about the relationship between the independent variable and the dependent variable is a statement like: cases in group A tended to have higher scores on variable X than cases in group B or group C.  This interpretation is complicated by the fact that the relationship is not direct, but operates through the discriminant function.  Dependent variable groups are distinguished by scores on discriminant functions, not on values of independent variables. The scores on functions are based on the values of the independent variables that are multiplied by the function coefficients.
  • 8. SW388R7 Groups, functions, and variables Data Analysis & Computers II Slide 8  To interpret the relationship between an independent variable and the dependent variable, we must first identify how the discriminant functions separate the groups, and then the role of the independent variable is for each function.  SPSS provides a table called "Functions at Group Centroids" (multivariate means) that indicates which groups are separated by which functions.  SPSS provides another table called the "Structure Matrix" which, like its counterpart in factor analysis, identifies the loading, or correlation, between each independent variable and each function. This tells us which variables to interpret for each function. Each variable is interpreted on the function that it loads most highly on.
  • 9. SW388R7 Functions at Group Centroids Data Analysis & Computers II Slide 9 In order to specify the role that each independent variable plays in predicting group membership on the dependent variable, we must link together the relationship between the discriminant functions and the groups defined by the dependent variable, the role of the significant independent variables in the discriminant functions, and the differences in group means for each of the variables. Function 2 separates Functions at Group Centroids survey respondents who thought we spend Function too little money on welfare (positive value WELFARE 1 2 of 0.235) from survey 1 -.220 .235 respondents who 2 .446 -.031 thought we spend too 3 -.311 -.362 much money (negative value of -0.362) on Unstandardized canonical discriminant welfare. We ignore the functions evaluated at group means second group (-0.031) Function 1 separates survey respondents in this comparison who thought we spend about the right because it was amount of money on welfare (the positive distinguished from the value of 0.446) from survey respondents other two groups by who thought we spend too much (negative function 1. value of -0.311) or little money (negative value of -0.220) on welfare.
  • 10. SW388R7 Structure Matrix Data Analysis & Computers II Slide 10 Based on the structure matrix, the We do not interpret predictor variables strongly associated with loadings in the discriminant function 1 which distinguished structure matrix unless between survey respondents who thought they are 0.30 or higher. we spend about the right amount of money on welfare and survey respondents who thought we spend too much or little money on welfare were number of hours worked in Structure Matrix the past week (r=-0.582) and highest year of school completed (r=0.687). Function 1 2 HIGHEST YEAR OF .687* .136 SCHOOL COMPLETED NUMBER OF HOURS -.582* .345 WORKED LAST WEEK R SELF-EMP OR WORKS .223 .889* FOR SOMEBODY RESPONDENTS INCOMEa .101 .292* Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. Based on the *. Largest absolute correlation between each variable and structure matrix, the predictor variable strongly associated with discriminant function 2 which any discriminant function distinguished a. between survey respondents who thought we spend too little This variable not used inand analysis. respondents money on welfare the survey who thought we spend too much money on welfare was self-employment (r=0.889).
  • 11. SW388R7 Group Statistics Data Analysis & Computers II Slide 11 Group Statistics Valid N (listwise) WELFARE Mean Std. Deviation Unweighted Weighted 1 TOO LITTLE NUMBER OF HOURS The average number of hours worked 43.96 13.240 WORKED LAST WEEK in the past week56.000 56 for survey HIGHEST YEAR OF respondents who thought we spend 13.73 2.401about the 56 right amount of money on 56.000 SCHOOL COMPLETED welfare (mean=37.90) was lower than R SELF-EMP OR WORKS 1.93 .260the average number of hours worked 56 56.000 FOR SOMEBODY in the past weeks for survey RESPONDENTS INCOME 13.70 5.034respondents who thought we spend too 56 56.000 2 ABOUT RIGHT NUMBER OF HOURS much money on welfare (mean=43.96) 37.90 13.235and survey respondents who thought 50 50.000 WORKED LAST WEEK HIGHEST YEAR OF we spend too little money on welfare 14.78 2.558(mean=42.03). 50.000 50 SCHOOL COMPLETED R SELF-EMP OR WORKS 1.90 .303This enables us to make the 50 50.000 FOR SOMEBODY statement: "survey respondents who RESPONDENTS INCOME 14.00 5.503thought we spend about the right 50 50.000 3 TOO MUCH NUMBER OF HOURS amount of money on welfare worked 42.03 10.456fewer hours in the past week than 32 32.000 WORKED LAST WEEK HIGHEST YEAR OF survey respondents who thought we 13.38 2.524spend too32 much 32.000 or little money on SCHOOL COMPLETED welfare." R SELF-EMP OR WORKS 1.75 .440 32 32.000 FOR SOMEBODY RESPONDENTS INCOME 14.75 5.304 32 32.000 Total NUMBER OF HOURS 41.32 12.846 138 138.000 WORKED LAST WEEK HIGHEST YEAR OF 14.03 2.537 138 138.000 SCHOOL COMPLETED R SELF-EMP OR WORKS
  • 12. SW388R7 Which independent variables to interpret Data Analysis & Computers II Slide 12  In a simultaneous discriminant analysis, in which all independent variables are entered together, we only interpret the relationships for independent variables that have a loading of 0.30 or higher one or more discriminant functions. A variable can have a high loading on more than one function, which complicates the interpretation. We will interpret the variable for the function on which it has the highest loading.  In a stepwise discriminant analysis, we limit the interpretation of relationships between independent variables and groups defined by the dependent variable to those independent variables that met the statistical test for inclusion in the analysis.
  • 13. SW388R7 Discriminant analysis and classification Data Analysis & Computers II Slide 13  Discriminant analysis consists of two stages: in the first stage, the discriminant functions are derived; in the second stage, the discriminant functions are used to classify the cases.  While discriminant analysis does compute correlation measures to estimate the strength of the relationship, these correlations measure the relationship between the independent variables and the discriminant scores.  A more useful measure to assess the utility of a discriminant model is classification accuracy, which compares predicted group membership based on the discriminant model to the actual, known group membership which is the value for the dependent variable.
  • 14. SW388R7 Evaluating usefulness for discriminant models Data Analysis & Computers II Slide 14  The benchmark that we will use to characterize a discriminant model as useful is a 25% improvement over the rate of accuracy achievable by chance alone.  Even if the independent variables had no relationship to the groups defined by the dependent variable, we would still expect to be correct in our predictions of group membership some percentage of the time. This is referred to as by chance accuracy.  The estimate of by chance accuracy that we will use is the proportional by chance accuracy rate, computed by summing the squared percentage of cases in each group.
  • 15. SW388R7 Comparing accuracy rates Data Analysis & Computers II Slide 15  To characterize our model as useful, we compare the cross- validated accuracy rate produced by SPSS to 25% more than the proportional by chance accuracy.  The cross-validated accuracy rate is a one-at-a-time hold out method that classifies each case based on a discriminant solution for all of the other cases in the analysis. It is a more realistic estimate of the accuracy rate we should expect in the population because discriminant analysis inflates accuracy rates when the cases classified are the same cases used to derive the discriminant functions.  Cross-validated accuracy rates are not produced by SPSS when separate covariance matrices are used in the classification, which we address more next week.
  • 16. SW388R7 Computing by chance accuracy Data Analysis & Computers II Slide 16  The percentage of cases in each group defined by the dependent variable are reported in the table "Prior Probabilities for Groups" Prior Probabilities for Groups Cases Used in Analysis WELFARE Prior Unweighted Weighted 1 TOO LITTLE .406 56 56.000 2 ABOUT RIGHT .362 50 50.000 3 TOO MUCH .232 32 32.000 Total 1.000 138 138.000 The proportional by chance accuracy rate was computed by squaring and summing the proportion of cases in each group from the table of prior probabilities for groups (0.406² + 0.362² + 0.232² = 0.350). A 25% increase over this would require that our cross-validated accuracy be 43.7% (1.25 x 35.0% = 43.7%).
  • 17. SW388R7 Comparing the cross-validated accuracy rate Data Analysis & Computers II Slide 17 b,c Classification Results Predicted Group Membership 1 TOO 2 ABOUT WELFARE LITTLE RIGHT 3 TOO MUCH Total Original Count 1 TOO LITTLE 43 15 6 64 2 ABOUT RIGHT 26 30 6 62 3 TOO MUCH 17 10 9 36 Ungrouped cases 3 3 2 8 % 1 TOO LITTLE 67.2 23.4 9.4 100.0 2 ABOUT RIGHT 41.9 48.4 9.7 100.0 3 TOO MUCH 47.2 27.8 25.0 100.0 Ungrouped cases 37.5 37.5 25.0 100.0 Cross-validated a Count 1 TOO LITTLE 43 15 6 64 SPSS reports the cross-validated accuracy rate 2 ABOUT RIGHT in the footnotes to the table "Classification 6 26 30 62 3 TOO MUCH The cross-validated accuracy rate 8 Results." 17 11 36 % 1 TOO LITTLE by SPSS was 50.0% which was 9.4 computed 67.2 23.4 100.0 greater than or equal to the proportional by 2 ABOUT RIGHT 41.9 48.4 9.7 100.0 chance accuracy criteria of 43.7%. 3 TOO MUCH 47.2 30.6 22.2 100.0 a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. b. 50.6% of original grouped cases correctly classified. c. 50.0% of cross-validated grouped cases correctly classified.
  • 18. SW388R7 Problem 1 Data Analysis & Computers II Slide 18 1. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last year from survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents who had not seen an x-rated movie in the last year. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic
  • 19. SW388R7 Dissecting problem 1 - 1 Data Analysis & Computers II Slide 19 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated For these problems, we will movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last no problemsurvey respondents who had not seen an x-rated movie assume that there is year from with missing data, violation of in the last year. assumptions, or outliers. Survey respondents whowe are told tox-rated movie in the last year were younger than survey In this problem, had seen an respondents0.05 as alpha for the x-rated movie in the last year. Survey respondents who had use who had not seen an seen an discriminant analysis. last year were more likely to be male than survey respondents x-rated movie in the who had not seen an x-rated movie in the last year. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic
  • 20. SW388R7 Dissecting problem 1 - 2 Data Analysis & Computers II Slide 20 The variables listed first in the problem statement are the independent variables 1. In the dataset GSS2000.sav, is theof school statement true, false, or an incorrect (IVs): "age" [age], "highest year following completed" [educ], "sex" [sex], and application of a statistic? Assume that there is no problem with missing data, violation of "income" [rincom98]. assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last year from survey respondents who had not seen an x- rated movie in the last year. Survey variable usedwho had seen an x-rated movie in the last year were younger than survey The respondents to define respondents the dependent groups is who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents variable (DV): "seen x-rated movie in last year" [xmovie]. who had not seen an x-rated movie in the last year. When a problem states that a list of independent variables can distinguish among groups, we do a discriminant analysis entering all of the variables simultaneously.
  • 21. SW388R7 Dissecting problem 1 - 3 Data Analysis & Computers II Slide 21 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last year from survey respondents who had not seen an x- rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more the dependent The problem identifies two groups for likely to be male than survey respondents who had not seen an x-rated movie in the last year. variable: •survey respondents who had seen an x-rated 1. True movie in the last year •survey respondents who had not seen an x- 2. True with caution movie in the last year rated 3. False 4. Inappropriate application of a statistic the analysis will be To distinguish among two groups, required to find one statistically significant discriminant function.
  • 22. SW388R7 Dissecting problem 1 - 4 Data Analysis & Computers II Slide 22 The specific relationships listed in the problem indicate how the independent The variables "age" [age], "highest year of school completed" [educ], "sex"the variable relates to groups of [sex], and "income" [rincom98] are useful in distinguishing between groups based on responsesmean for x-rated dependent variable, i.e., the to "seen movie in last year" [xmovie]. These predictors age will be lower for respondents who had seen differentiate survey respondents who had seen an x-rated movie in the last an x-rated movie in the last year from survey respondents who had not seen an x-rated movie year. in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents who had not seen an x-rated movie in the last year. 1. True 2. True with caution 3. False In order for the discriminant analysis to be 4. Inappropriate application of a statistic we must have enough statistically true, significant functions to distinguish among the groups, the classification accuracy rate must be substantially better than could be obtained by chance alone, and each significant relationship must be interpreted correctly.
  • 23. SW388R7 LEVEL OF MEASUREMENT - 1 Data Analysis & Computers II Slide 23 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last year from survey respondents who had not seen an x- rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents who had not seen an x-rated movie in the last year. Discriminant analysis requires that the dependent variable be non-metric and the 1. True independent variables be metric or dichotomous. 2. True with caution"seen x-rated movie in last year" [xmovie] is an dichotomous variable, which satisfies the level of 3. False measurement requirement. 4. Inappropriate application of a statistic It contains two categories: survey respondents who had seen an x-rated movie in the last year and survey respondents who had not seen an x- rated movie in the last year.
  • 24. SW388R7 LEVEL OF MEASUREMENT - 2 Data Analysis & Computers II Slide 24 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last year from survey respondents who had not seen an x- rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents "Age" [age] and "highest year of schoolhad not seen an x-rated movie in the last year. who completed" [educ] are interval level variables, which satisfies the level of measurement 1. True requirements for discriminant "Income" [rincom98] is an ordinal level analysis. 2. True with caution variable. If we follow the convention of 3. False treating ordinal level variables as metric variables, the level of measurement 4. Inappropriate application of a statistic requirement for discriminant analysis is satisfied. Since some data analysts do not agree with this convention, a note "Sex" [sex] is a dichotomous or of caution should be included in our dummy-coded nominal variable interpretation. which may be included in discriminant analysis.
  • 25. SW388R7 Request simultaneous discriminant analysis Data Analysis & Computers II Slide 25 Select the Classify | Discriminant… command from the Analyze menu.
  • 26. SW388R7 Selecting the dependent variable Data Analysis & Computers II Slide 26 First, highlight the dependent variable xmovie in the list of variables. Second, click on the right arrow button to move the dependent variable to the Grouping Variable text box.
  • 27. SW388R7 Defining the group values Data Analysis & Computers II Slide 27 When SPSS moves the dependent variable to the Grouping Variable textbox, it puts two question marks in parentheses after the variable name. This is a reminder that we have to enter the number that represent the groups we want to include in the analysis. First, to specify the group numbers, click on the Define Range… button.
  • 28. SW388R7 Completing the range of group values Data Analysis & Computers II Slide 28 The value labels for xmovie show two categories: 1 = YES 2 = NO First, type in 1 in The range of values that we need the Minimum text to enter goes from 1 as the box. minimum and 2 as the maximum. Second, type in 2 in the Third, click on the Maximum text Continue button to box. close the dialog box.
  • 29. SW388R7 Selecting the independent variables Data Analysis & Computers II Slide 29 Move the independent variables listed in the problem to the Independents list box.
  • 30. SW388R7 Specifying the method for including variables Data Analysis & Computers II Slide 30 SPSS provides us with two methods for including variables: to enter all of the independent variables at one time, and a stepwise method for selecting variables using a statistical test to determine the order in which variables are included. Since the problem states that there is a relationship without requesting the best predictors, we accept the default to Enter independents together.
  • 31. SW388R7 Requesting statistics for the output Data Analysis & Computers II Slide 31 Click on the Statistics… button to select statistics we will need for the analysis.
  • 32. SW388R7 Specifying statistical output Data Analysis & Computers II Slide 32 First, mark the Means checkbox on the Descriptives panel. We will use the group means in our interpretation. Second, mark the Univariate ANOVAs checkbox on the Descriptives panel. Perusing these tests suggests which variables might be useful descriminators. Third, mark the Box’s M checkbox. Box’s M statistic Fourth, click on the evaluates conformity to the Continue button to assumption of homogeneity of close the dialog box. group variances.
  • 33. SW388R7 Specifying details for classification Data Analysis & Computers II Slide 33 Click on the Classify… button to specify details for the classification phase of the analysis.
  • 34. SW388R7 Details for classification - 1 Data Analysis & Computers II Slide 34 First, mark the option button to Compute from group sizes on the Prior Probabilities panel. This incorporates the size of the groups defined by the dependent variable into the classification of cases using the discriminant functions. Second, mark the Casewise results checkbox on the Display panel to include classification details for each case in the output. Third, mark the Summary table checkbox to include summary tables comparing actual and predicted classification.
  • 35. SW388R7 Details for classification - 2 Data Analysis & Computers II Slide 35 Fourth, mark the Leave-one-out classification checkbox to request SPSS to include a cross-validated classification in the output. This option produces a less biased estimate of classification accuracy by sequentially holding each case out of the calculations for the discriminant functions, and using the derived functions to classify the case held out.
  • 36. SW388R7 Details for classification - 3 Data Analysis & Computers II Slide 36 Fifth, accept the default of Within-groups Seventh, click option button on the Use Covariance Matrix on the Continue panel. The Covariance matrices are the button to close measure of the dispersion in the groups the dialog box. defined by the dependent variable. If we fail the homogeneity of group variances test (Box’s M), our option is use Separate groups covariance in classification. Sixth, mark the Combines- groups checkbox on the Plots panel to obtain a visual plot of the relationship between functions and groups defined by the dependent variable.
  • 37. SW388R7 Completing the discriminant analysis request Data Analysis & Computers II Slide 37 Click on the OK button to request the output for the disciminant analysis.
  • 38. SW388R7 Sample size – ratio of cases to variables Data Analysis & Computers II Slide 38 Analysis Case Processing Summary Unweighted Cases N Percent Valid 119 44.1 Excluded Missing or out-of-range 49 18.1 group codes At least one missing 66 24.4 discriminating variable Both missing or out-of-range group codes The minimum ratio of valid 36 13.3 and at least one missing cases to independent discriminating variable variables for discriminant Total 151 analysis is 5 to 1, with a 55.9 Total 270 preferred ratio of 20 to 1. In 100.0 this analysis, there are 119 valid cases and 4 independent variables. The ratio of cases to independent variables is 29.75 to 1, which satisfies the minimum requirement. In addition, the ratio of 29.75 to 1 satisfies the preferred ratio of 20 to 1.
  • 39. SW388R7 Sample size – minimum group size Data Analysis & Computers II Slide 39 Prior Probabilities for Groups Cases Used in Analysis In addition to the requirement for the XMOVIE Prior Unweighted Weighted ratio of cases to independent 1 .311 37 37.000 variables, discriminant analysis 2 .689 82 82.000 requires that there be a minimum Total 1.000 119 119.000 number of cases in the smallest group defined by the dependent variable. The number of cases in the smallest group must be larger than the number of independent variables, and preferably contains 20 or more cases. The number of cases in the smallest group in this problem is 37, which is larger than the number of independent variables (4), satisfying the minimum requirement. In addition, the number of cases in the smallest group satisfies the preferred minimum of 20 cases. If the sample size did not initially satisfy the minimum requirements, discriminant analysis is not appropriate.
  • 40. SW388R7 NUMBER OF DISCRIMINANT FUNCTIONS - 1 Data Analysis & Computers II Slide 40 The maximum possible number of discriminant functions is the smaller of one less than the number of groups defined by the dependent variable and the number of independent variables. In this analysis there were 2 groups defined by seen x-rated movie in last year and 4 independent variables, so the maximum possible number of discriminant functions was 1.
  • 41. SW388R7 NUMBER OF DISCRIMINANT FUNCTIONS - 2 Data Analysis & Computers II Slide 41 In the table of Wilks' Lambda which tested functions for statistical significance, the direct analysis identified 1 discriminant functions that were statistically significant. The Wilks' lambda statistic for the test of function 1 (chi-square=24.159) had a probability of <0.001 which was less than or equal to the level of significance of 0.05. The significance of the maximum possible number of discriminant functions supports the interpretation of a solution using 1 discriminant function.
  • 42. SW388R7 Data Analysis & Independent variables and group membership: relationship of functions to groups Computers II Slide 42 In order to specify the role that each independent variable plays in predicting group membership on the dependent variable, we must link together the relationship between the discriminant functions and the groups defined by the dependent variable, the role of the significant independent variables in the discriminant functions, and the differences in group means for each of the variables. Each function divides the groups into two subgroups by assigning negative values to one subgroup and positive values to the other subgroup. Function 1 separates survey respondents who had seen an x- rated movie in the last year (-.714) from survey respondents who had not seen an Functions at Group Centroids x-rated movie in the last year (.322). Function XMOVIE 1 1 -.714 2 .322 Unstandardized canonical discriminant functions evaluated at group means
  • 43. SW388R7 Data Analysis & Independent variables and group membership: predictor loadings on functions Computers II Slide 43 We do not interpret loadings in the structure Based on the structure matrix, the matrix unless they predictor variables strongly associated with are 0.30 or discriminant function 1 which distinguished higher. between survey respondents who had seen an x-rated movie in the last year and survey respondents who had not seen an x-rated movie in the last year were age (r=0.467) and sex (r=0.770). Structure Matrix Function 1 SEX .770 AGE .467 EDUC .118 RINCOM98 .044 Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function.
  • 44. SW388R7 Data Analysis & Independent variables and group membership: predictors associated with first function - 1 Computers II Slide 44 Group Statistics Valid N (listwise) XMOVIE Mean Std. Deviation Unweighted The Weighted age for survey average 1 AGE 37.24 10.838 37 37.000 respondents who had seen an EDUC 13.86 2.720 x-rated movie in the last year 37 37.000 SEX 1.27 .450 (mean=37.24) was lower than the 37 37.000 average age for survey RINCOM98 13.76 5.209 37 37.000 respondents who had not seen an 2 AGE 42.70 11.461 x-rated movie in the last year 82 82.000 EDUC 14.18 2.534 (mean=42.70). 82 82.000 SEX 1.65 .481 82 82.000 This supports the relationship that RINCOM98 14.00 5.308 82 82.000 "survey respondents who had seen Total AGE 41.00 11.508 an x-rated movie in the last year 119 119.000 EDUC 14.08 2.586 were younger than survey 119 119.000 respondents who had not seen an SEX 1.53 .501 119 119.000 x-rated movie in the last year." RINCOM98 13.92 5.256 119 119.000
  • 45. SW388R7 Data Analysis & Independent variables and group membership: predictors associated with first function - 2 Computers II Slide 45 Group Statistics Valid N (listwise) XMOVIE Mean Std. Deviation Unweighted Weighted 1 AGE Since sex is a dichotomous variable, 37.24 10.838 37 37.000 the mean is not directly interpretable. EDUC 13.86 2.720 37Its interpretation must take into 37.000 SEX 1.27 .450 37account the coding by which 1 37.000 RINCOM98 13.76 5.209 37corresponds to male and 2 37.000 corresponds to female. The lower 2 AGE 42.70 11.461 82mean for survey respondents who 82.000 EDUC 14.18 2.534 82had seen an x-rated movie in the last 82.000 SEX 1.65 .481 82year 82.000 (mean=1.27), when compared to the mean for survey respondents RINCOM98 14.00 5.308 82who had not seen an x-rated movie in 82.000 Total AGE 41.00 11.508 119the last year (mean=1.65), implies 119.000 EDUC 14.08 2.586 119 that 119.000 the group contained more survey respondents who were male and SEX 1.53 .501 119fewer survey respondents who were 119.000 RINCOM98 13.92 5.256 119female.119.000 This supports the relationship that "survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents who had not seen an x-rated movie in the last year."
  • 46. SW388R7 CLASSIFICATION USING THE DISCRIMINANT MODEL: Data Analysis & by chance accuracy rate Computers II Slide 46 The independent variables could be characterized as useful predictors of membership in the groups defined by the dependent variable if the cross-validated classification accuracy rate was significantly higher than the accuracy attainable by chance alone. Operationally, the cross-validated classfication accuracy rate should be 25% or more higher than the proportional by chance accuracy rate. The proportional by chance accuracy rate was computed by squaring and summing the proportion of cases in each group from the table of prior probabilities for groups (0.311² + 0.689² = 0.571). Prior Probabilities for Groups Cases Used in Analysis XMOVIE Prior Unweighted Weighted 1 .311 37 37.000 2 .689 82 82.000 Total 1.000 119 119.000
  • 47. SW388R7 CLASSIFICATION USING THE DISCRIMINANT MODEL: Data Analysis & criteria for classification accuracy Computers II Slide 47 b,c Classification Results Predicted Group Membership XMOVIE 1 2 Total Original Count 1 15 22 37 2 12 70 82 Ungrouped cases 13 36 49 % 1 40.5 59.5 100.0 2 14.6 85.4 100.0 Ungrouped cases 26.5 73.5 100.0 Cross-validated a Count 1 15 22 37 2 12 70 82 % 1 40.5 59.5 100.0 2 14.6 85.4 100.0 a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. b. 71.4% of original grouped cases correctly classified. c. 71.4% of cross-validated grouped cases correctly classified. The cross-validated accuracy rate computed by SPSS was 71.4% which was greater than or equal to the proportional by chance accuracy criteria of 71.4% (1.25 x 57.1% = 71.4%). The criteria for classification accuracy is satisfied.
  • 48. SW388R7 Answering the question in problem 1 - 1 Data Analysis & Computers II Slide 48 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictors differentiate survey respondents who had seen an x-rated movie in the last year from survey respondents who had not seen an x- rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents who had not seen an x-rated movie in the last year. We found one statistically significant 1. True discriminant function, making it possible to 2. True with caution distinguish among the two groups defined by 3. False the dependent variable. 4. Inappropriate applicationcross-validated classification Moreover, the of a statistic accuracy surpassed the by chance accuracy criteria, supporting the utility of the model.
  • 49. SW388R7 Answering the question in problem 1 - 2 Data Analysis & Computers II Slide 49 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. The variables "age" [age], "highest year of school completed" [educ], "sex" [sex], and "income" [rincom98] are useful in distinguishing between that each statement We verified groups based on responses to "seen x-rated movie in last year" [xmovie]. These predictorsrelationship between respondents who had seen about the differentiate survey an x-rated movie in the last year from survey respondentswas correct. seen an x-rated movie predictors and groups who had not in the last year. Survey respondents who had seen an x-rated movie in the last year were younger than survey respondents who had not seen an x-rated movie in the last year. Survey respondents who had seen an x-rated movie in the last year were more likely to be male than survey respondents who had not seen an x-rated movie in the last year. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic The answer to the question is true with caution. A caution is added because of the inclusion of ordinal level variables.
  • 50. SW388R7 Problem 2 Data Analysis & Computers II Slide 50 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency of prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful predictor for distinguishing between groups based on responses to "attitude toward abortion when there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer" [pray]. These predictors differentiate survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby from survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. The most important predictor of groups based on responses to attitude toward abortion when there is a strong chance of serious defect in the baby was frequency of prayer. Survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby prayed more often than survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. 1. True 2. True with caution 3. False 4. Inappropriate application of a statistic
  • 51. SW388R7 Dissecting problem 2 - 1 Data Analysis & Computers II Slide 51 The variables listed first in the problem statement are the independent variables (IVs): "respondent's degree of religious fundamentalism" [fund], "frequency of prayer" [pray], and "frequency of attendance at religious services" [attend]. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency of prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful predictor for distinguishing between groups based on responses to "attitude toward abortion when there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer" [pray]. These predictors differentiate survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby from survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. The variable used to define The most important predictor of groups based on responses to attitude dependent groups is the toward abortion when there is a strong a problem asks us defect in the baby was variable (DV):prayer. toward When chance of serious frequency of "attitude to identify the best or abortion when there is a most useful predictors strong chance of serious from a list of defect in the baby" [abdefect] independent variables, we do stepwise discriminant analysis.
  • 52. SW388R7 Dissecting problem 2 - 2 Data Analysis & Computers II Slide 52 The problem identifies two groups for the dependent variable: •survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby •survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that amongis no groups, the analysis will be required toof assumptions, or outliers. To distinguish there two problem with missing data, violation find one Use a level of significance of 0.05 for evaluating the statistical relationship. statistically significant discriminant functions. From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency of prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful predictor for distinguishing between groups based on responses to "attitude toward abortion when there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer" [pray]. These predictors differentiate survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby from survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. The most important predictor of groups based on responses to attitude toward abortion when there is a strong chance of serious defect in the baby was frequency of prayer. The importance of predictors is based upon the stepwise addition of variables to the analysis.
  • 53. SW388R7 Dissecting problem 2 - 3 Data Analysis & Computers II Slide 53 From the list of variables "respondent'slisted in thereligious fundamentalism" [fund], "frequency of The specific relationships degree of problem indicate how the prayer" [pray], and "frequency of attendancegroups of theservices" [attend], the most useful independent variable relates to at religious dependent variable, i.e., predictor for distinguishing frequency groups based be lower for respondents who the mean for between of prayer will on responses to "attitude toward abortion when there is a strong chance should be possible in thewoman [abdefect] is "frequency of prayer" [pray]. thought it of serious defect for a baby" to obtain a legal abortion if These predictors differentiate chance respondentsdefectthought it should be possible for a woman there is a strong survey of a serious who in the baby compared to to obtain a legal abortion if therewho didn't think it should be possible forin the baby from survey survey respondents is a strong chance of a serious defect a respondents who didn't think itashouldabortion if there is woman to obtain aa woman to obtain legal be possible for a a strong chance of legal abortion if there is a strong chance of a serious the baby. the baby. serious defect in defect in The most important predictor of groups based on responses to attitude toward abortion when there is a strong chance of serious defect in the baby was frequency of prayer. Survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby prayed more often than survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. 1. True 2. True with caution In a 3. False analysis, we only stepwise In order for a stepwise analysis to be interpret the independent 4. Inappropriate application of a statistic true, we must have enough statistically variables that are entered in significant functions to distinguish among the stepwise analysis. the groups, the order of entry must be correct, and each significant relationship must be interpreted correctly.
  • 54. SW388R7 LEVEL OF MEASUREMENT - 1 Data Analysis & Computers II Slide 54 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency of prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful predictor for distinguishing between groups based on responses to "attitude toward abortion when there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer" [pray]. These predictors differentiate survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby from survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. The most important predictor of groups based on responses to attitude toward abortion when there is a strong chance of serious defect in the baby was frequency of prayer. Survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby prayed more often than survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect requires that the Discriminant analysis in the baby. dependent variable be non-metric and the independent variables be metric or dichotomous. "Attitude toward abortion when there is a strong chance of serious defect in the baby" [abdefect] is a nominal level variable, which satisfies the level of measurement requirement.
  • 55. SW388R7 LEVEL OF MEASUREMENT - 2 Data Analysis & Computers II Slide 55 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers. Use a level of significance of 0.05 for evaluating the statistical relationship. From the list of variables "respondent's degree of religious fundamentalism" [fund], "frequency of prayer" [pray], and "frequency of attendance at religious services" [attend], the most useful predictor for distinguishing between groups based on responses to "attitude toward abortion when there is a strong chance of serious defect in the baby" [abdefect] is "frequency of prayer" [pray]. These predictors differentiate survey respondents who thought it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby from survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby. The most important predictor of groups based on responses to attitude toward abortion when there is a strong chance of serious defect in the baby was frequency of prayer. "Respondent's degree of religious Survey respondents who didn't think it should be possible for a woman to obtain a legal abortion if there is a strong chance of a serious defect in the baby [fund], "frequency ofthan fundamentalism" prayed more often survey respondents who thought it should be possible for [pray], and "frequency of abortion if prayer" a woman to obtain a legal there is a strong chance of a serious defect in theattendance at religious services" baby. [attend] are ordinal level variables. If we follow the convention of treating ordinal level variables as metric variables, the level of measurement requirement for discriminant analysis is satisfied. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation.
  • 56. SW388R7 Request stepwise discriminant analysis Data Analysis & Computers II Slide 56 Select the Classify | Discriminant… command from the Analyze menu.
  • 57. SW388R7 Selecting the dependent variable Data Analysis & Computers II Slide 57 First, highlight the dependent variable abdefect in the list of variables. Second, click on the right arrow button to move the dependent variable to the Grouping Variable text box.
  • 58. SW388R7 Defining the group values Data Analysis & Computers II Slide 58 When SPSS moves the dependent variable to the Grouping Variable textbox, it puts two question marks in parentheses after the variable name. This is a reminder that we have to enter the number that represent the groups we want to include in the analysis. First, to specify the group numbers, click on the Define Range… button.
  • 59. SW388R7 Completing the range of group values Data Analysis & Computers II Slide 59 The value labels for abdefect show two categories: 1 = YES 2 = NO First, type in 1 in The range of values that we need the Minimum text to enter goes from 1 as the box. minimum and 2 as the maximum. Second, type in 2 in the Third, click on the Maximum text Continue button to box. close the dialog box.
  • 60. SW388R7 Selecting the independent variables Data Analysis & Computers II Slide 60 Move the independent variables listed in the problem to the Independents list box.
  • 61. SW388R7 Specifying the method for including variables Data Analysis & Computers II Slide 61 SPSS provides us with two methods for including variables: to enter all of the independent variables at one time, and a stepwise method for selecting variables using a statistical test to determine the order in which variables are included. Since the problem calls for identifying the best predictors, we click on the option button to Use stepwise method.
  • 62. SW388R7 Requesting statistics for the output Data Analysis & Computers II Slide 62 Click on the Statistics… button to select statistics we will need for the analysis.
  • 63. SW388R7 Specifying statistical output Data Analysis & Computers II Slide 63 First, mark the Means checkbox on the Descriptives panel. We will use the group means in our interpretation. Second, mark the Univariate ANOVAs checkbox on the Descriptives panel. Perusing these tests suggests which variables might be useful descriminators. Third, mark the Box’s M checkbox. Box’s M statistic Fourth, click on the evaluates conformity to the Continue button to assumption of homogeneity of close the dialog box. group variances.
  • 64. SW388R7 Specifying details for the stepwise method Data Analysis & Computers II Slide 64 Click on the Method… button to specify the specific statistical criteria to use for including variables.