SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
UNCLASSIFIED / FOUO

   UNCLASSIFIED / FOUO




                          National Guard
                         Black Belt Training
                              Module 37

                          Multiple Regression


                                                UNCLASSIFIED / FOUO

                                                    UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO




CPI Roadmap – Analyze
                                                             8-STEP PROCESS
                                                                                                       6. See
   1.Validate          2. Identify           3. Set          4. Determine          5. Develop                           7. Confirm    8. Standardize
                                                                                                      Counter-
      the             Performance         Improvement            Root               Counter-                             Results        Successful
                                                                                                      Measures
    Problem               Gaps              Targets              Cause             Measures                             & Process        Processes
                                                                                                      Through

        Define                  Measure                      Analyze                            Improve                        Control



                                    ACTIVITIES                                     TOOLS
                                                                             • Value  Stream Analysis
                       •   Identify Potential Root Causes                    • Process Constraint ID
                       •   Reduce List of Potential Root                     • Takt Time Analysis
                           Causes                                            • Cause and Effect Analysis
                                                                             • Brainstorming
                       •   Confirm Root Cause to Output
                                                                             • 5 Whys
                           Relationship
                                                                             • Affinity Diagram
                       •   Estimate Impact of Root Causes                    • Pareto
                           on Key Outputs                                    • Cause and Effect Matrix
                                                                             • FMEA
                       •   Prioritize Root Causes
                                                                             • Hypothesis Tests
                       •   Complete Analyze Tollgate                         • ANOVA
                                                                             • Chi Square
                                                                             • Simple and Multiple
                                                                               Regression


                       Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive.       UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO




   Learning Objectives
         Understand how to identify correlation with multiple
          variables
         Learn how to create a mathematical model for the
          effect of multiple inputs on an output variable
         Understand and identify multicollinearity
         Understand how to use best subsets to identify the
          best model
         Examine unusual observations to learn more about
          the data



                                Multiple Regression    UNCLASSIFIED / FOUO   3
UNCLASSIFIED / FOUO




   Multiple Regression
         In Simple Linear Regression, we
          had:
                                                         Y = f(X)
            Y = B0 + B1X

         In Multiple Linear Regression,
          we have:
            Y = B0 + B1X1 + B2X2 +                        X3
             B3X3                                          X1
                                                                 X5

         We’d like to identify which, if any,
          of the predictor variables are                    X2
                                                                 X4           Y
          useful in predicting Y



                                   Multiple Regression           UNCLASSIFIED / FOUO   4
UNCLASSIFIED / FOUO




When Should I Use Multiple Regression?
                                                              Independent Variable (X)
                                                            Continuous                   Attribute
                      Dependent Variable (Y)

                                               Continuous


                                                            Regression                       ANOVA
                                               Attribute




                                                             Logistic                    Chi-Square (2)
                                                            Regression                        Test



    The tool depends on the data type. Regression is typically used with a continuous
     input and a continuous response but may also be used with count or categorical
                                  inputs and outputs.
                                                                   Multiple Regression                     UNCLASSIFIED / FOUO   5
UNCLASSIFIED / FOUO




   Basic Steps for Regression Modeling
           STEPS                                    OBJECTIVES                                    KEY QUESTIONS


              1
      Process Flowchart                         To identify KPIVs and                       Which KPIVs will significantly
          SIPOC                                        KPOVs                                 improve which KPOVs?

               2                                                                                Does it look like there is
         Scatter Plot,                          To visualize the data
          Histogram                                                                               C&E relationship?

              3                                                                                 How strong is the C&E
                                          To qualify the C&E relationship
      Correlation, Test
                                         (Strength, % Variability, P-value)                        relationship?
        Hypothesis

              4                          To quantify the C&E relationship                      What is the prediction
    Regression Analysis                     (Method of Least Squares)                               equation?

              5                                                                               Is there anything suspicious
                                          To validate the model selected
     Residual Analysis                                                                          with the model selected?

           KPIV = Key Process Input Variables                             KPOV = Key Process Output Variables
                                                        Multiple Regression                                UNCLASSIFIED / FOUO   6
UNCLASSIFIED / FOUO




   Example: Production Plant
        A chemical engineer is investigating the amount of
         silver required in the high volume production of contact
         switches for a new Army radio. Although only a small amount of
         silver is deposited on the switches, a larger amount is wasted
         through a multiple step process. She has collected data and
         would like to develop a prediction model. A-06 Production
         Plant
        Step 1: The variables identified as KPIVs are given below:
           X1 = Average temperature of rinse bath (degrees C)
           X2 = Speed of reel that feeds the switches through the line (inches/min)
              X3 = Thickness of silver deposit (angstroms)
           X4 = Water consumed (gallons per day)                             What questions
           Y = Amount of silver consumed (pounds/day)
                                                                              would you ask
                                                                              about this data?
Source: Applied Regression Analysis, Draper and Smith

                                                        Multiple Regression             UNCLASSIFIED / FOUO   7
UNCLASSIFIED / FOUO




     Visualize the Data!


   Step 2:
   Visualize the Data


   Data file: A-06 Production
      Plant.mtw
   Select Graph>Matrix Plot




                                Multiple Regression   UNCLASSIFIED / FOUO   8
UNCLASSIFIED / FOUO




      Step 2: Visualize the Data!
  Looking for relationships between variables...


    This dialog box comes up
    first

    Select Matrix of Plots – Simple
    Since we have only one (Y)
    variable and no groups

    Click on OK to go the next
    Dialog box




                                      Multiple Regression   UNCLASSIFIED / FOUO   9
UNCLASSIFIED / FOUO




   Step 2: Visualize the Data!



    Double click on all of the
    variables you want to include in
    the Matrix, to place them in the
    Graph variables box

    Select Matrix Options to move
    on to the next dialog box




                                       Multiple Regression   UNCLASSIFIED / FOUO 10
UNCLASSIFIED / FOUO




   Step 2: Visualize the Data!


    Select Lower left to place all
    the graph labels to the
    lower left of the boxes

    Click on OK here and on the
    previous dialog box to get
    the matrix




                                     Multiple Regression   UNCLASSIFIED / FOUO 11
UNCLASSIFIED / FOUO




   Correlation Table
      There appear to be some relationships
      between certain variables and the response.

      Matrix Plot of Temp, Speed, Thickness, Water, Amt of Ag


                Temp

      12

      10                        Speed                             Is this
       8
    14.0
                                                                 good or
    13.5
                                                                   bad?                      Response
                                             Thickness
    13.0
                                                                                              Variable
     170
                                                                                                (Y)
     160
                                                                             Water
     150


      21

      20                                                                                   Amt of Ag
      19
           55    60    65   8   10      12   13.0       13.5    14.0   150     160   170




                                                    Multiple Regression                                UNCLASSIFIED / FOUO 12
UNCLASSIFIED / FOUO




 Quantify the Relationships Between Variables
  Step 3: Quantify the relationship

Select Stat>Basic
Statistics> Correlation




                               Multiple Regression   UNCLASSIFIED / FOUO 13
UNCLASSIFIED / FOUO




   Correlation Matrix
  Evaluating coefficients of correlation among predictors...




    Double click on all of the
    variables you want to
    include, to place them in
    the Variables box

    Check to display p-values
    (default setting)

    Click on OK to get the
    Correlation Matrix in your
    Session Window


                                        Multiple Regression    UNCLASSIFIED / FOUO 14
UNCLASSIFIED / FOUO




   Correlation Matrix

                                                                        The TOP number in
                                                                         each pair is the
                                                                             Pearson
                                                                          Coefficient of
                                                                           Correlation,
                                                                            (r-Value)
                                                                        While the BOTTOM
                                                                          number is the
                                                                             p-Value




       Predictor variable pairwise correlations larger than .5-.7 are signs of
           trouble ... Multicollinearity. We will explain more shortly.
                                          Multiple Regression                UNCLASSIFIED / FOUO 15
UNCLASSIFIED / FOUO




   Finding the Regression Equation...
  Step 4: Develop a prediction model

    Select: Stat>
    Regression>
    Regression




                         Multiple Regression   UNCLASSIFIED / FOUO 16
UNCLASSIFIED / FOUO




   Finding the Regression Equation... (Cont.)



    Double click on C5 Amt of AG
    and place it in the Response:
    variable box, then double
    click on all the variables you
    want to place in the Predictors:
    box.

    Select Options to go to next
    dialog box.




                                       Multiple Regression   UNCLASSIFIED / FOUO 17
UNCLASSIFIED / FOUO




   Finding the Regression Equation... (Cont.)



    In this dialog box, the only
    thing you have to do is check
    Variance inflation factors


    Click on OK here and on
    previous dialog box to get the
    regression analysis in your
    Session Window




                                     Multiple Regression   UNCLASSIFIED / FOUO 18
UNCLASSIFIED / FOUO




   Regression Equation
  Minitab displays the following regression equation:
       Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
       + 0.0449 Water


       Predictor            Coef     SE Coef          T                 P      VIF
       Constant             5.72       10.83       0.53             0.607
       Temp             -0.01558     0.02616      -0.60             0.563    1.276
       Speed              0.2393      0.2644       0.90             0.383   10.997
       Thickness           0.443       1.033       0.43             0.675   11.671
       Water             0.04495     0.01481       3.04             0.010    1.731


       S = 0.412748            R-Sq = 80.9%       R-Sq(adj) = 74.5%                  The P-values indicate
                                                                                      whether a particular
                                                                                     predictor is significant
               This new model            R-Sq (adj) adjusts for degrees               in presence of other
              explains 80.9% of           of freedom due to variables                   predictors in the
             response variability          that have no real value. It                       model
                                              should be used when
                                               comparing models
                                              Multiple Regression                         UNCLASSIFIED / FOUO 19
UNCLASSIFIED / FOUO




   Interpreting P-values
         The P columns give the significance level
          for each term in the model
         Typically, if a P value is less than or equal
          to 0.05, the variable is considered significant
          (i.e., null hypothesis is rejected)
         If a P value is greater than 0.10, the term is removed
          from the model. A practitioner might leave the term in
          the model, if the P value is within the gray region
          between these two probability levels



                                Multiple Regression         UNCLASSIFIED / FOUO 20
UNCLASSIFIED / FOUO




   Regression Equation
                      Regression output in Minitab’s Session Window

        Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
        + 0.0449 Water


        Predictor           Coef   SE Coef           T               P      VIF
        Constant            5.72     10.83        0.53           0.607            Variance   Inflation Factor
        Temp            -0.01558   0.02616       -0.60           0.563    1.276
        Speed             0.2393    0.2644        0.90           0.383   10.997
        Thickness          0.443     1.033        0.43           0.675   11.671
        Water            0.04495   0.01481        3.04           0.010    1.731


        S = 0.412748        R-Sq = 80.9%         R-Sq(adj) = 74.5%




                       High VIF values are signs of trouble (VIF > 10)
                                           Multiple Regression                          UNCLASSIFIED / FOUO 21
UNCLASSIFIED / FOUO




   Problems with Several Predictor Variables
         Sometimes the Xs are correlated (dependent). This condition is
          known as Multicollinearity
         Multicollinearity can cause problems (sometimes severe)
               Estimates of the coefficients are affected (unstable, inflated
                variances)
               Difficulty isolating the effects of each X
               Coefficients depend on which Xs are included in the model
         High multicollinearity inflates the standard error estimates,
          which increases the P values
         If case of extreme multicollinearity, Minitab will throw out one
          term and give you notice


                                         Multiple Regression            UNCLASSIFIED / FOUO 22
UNCLASSIFIED / FOUO




 Graphical Representation of Multicollinearity

                  Variation
                 Explained by
                      X1                                                   Total
                                                                         Variation
                                                                            in Y
                  Variation
                 Explained by
                      X2

                        • Overlap represents correlation
                        • X1 and X2 are both correlated with Y
                        • X1 and X2 are highly correlated
                        • If X1 is in the model, we don’t need X2, and
                          vice versa


                                          Multiple Regression                 UNCLASSIFIED / FOUO 23
UNCLASSIFIED / FOUO




   Assessing the Degree of Multicollinearity
        We use a metric called Variance Inflation Factor (VIF):
                         1
          VIF                 2
                                                                                    Select
                      1  Ri                                       Stat>Regression>Regression>Options>
                                                                       Display variance inflation factors
         Where:
             Ri2 is the R2 value you get when you regress Xi against the other X’s
             A large Ri2 suggests that a variable is redundant
        Rule of Thumb:
             Ri2 > 0.9 is a cause for concern (high degree of collinearity) (VIF > 10)
             0.8 < Ri2 < 0.9 (moderate degree of collinearity) (VIF > 5)

        For the Production Plant data, Minitab gives us:

                                   VIF
                 Temp              1.276
                 Speed             10.997       Two VIF’s are a bit large, but in this case with a R-sq.
                 Thickness         11.671         of 80.9%, some multicollinearity can be tolerated
                 Water             1.731
                                                  Multiple Regression                        UNCLASSIFIED / FOUO 24
UNCLASSIFIED / FOUO




   Some Cautions About the Coefficients
         Remember the prediction equation obtained earlier:
          Amt of Ag  5.7  0.0156 Temp.  0.239 Speed  0.44 Thickness  0.0449 Water


         Relative importance of predictors cannot be
          determined from the size of their coefficients:
            The coefficients are scale dependent
            The coefficients are influenced by correlation among
                the predictor variables
               If a high degree of multicollinearity exists, even the
                signs of the coefficients may be misleading



                                         Multiple Regression               UNCLASSIFIED / FOUO 25
UNCLASSIFIED / FOUO




   Residual Analysis
  Step 5: Validate the selected model


    Select Stat>
    Regression>
    Regression



     Is there anything
      suspicious with
        this model?




                          Multiple Regression   UNCLASSIFIED / FOUO 26
UNCLASSIFIED / FOUO




   Residual Analysis (Cont.)



  Double click on C5 Amt of AG
  and place it in the Response
  variable box, then double
  click on all the variables you
  want to place in the Predictors
  box

  Select Graphs to go to next
  dialog box




                                    Multiple Regression   UNCLASSIFIED / FOUO 27
UNCLASSIFIED / FOUO




   Residual Analysis (Cont.)



 Select Four in one to get all four
 Residual plots on one graph, or
 you can pick and choose the plots
 You want


 Click on OK here and on previous
 Dialog box to get Residual plots




                                      Multiple Regression   UNCLASSIFIED / FOUO 28
UNCLASSIFIED / FOUO




   Residual Analysis (Cont.)
                  Not too bad overall…
                                                                Residual Plots for Amt of Ag
                              Normal Probability Plot                                                                       Versus Fits
                  99
                                                                           N          17
                                                                           AD      0.249               0.50
                  90                                                       P-Value 0.705
                                                                                                       0.25




                                                                                            Residual
      Percent




                  50                                                                                   0.00
                                                                                                                                                  If you want to see
                                                                                                       -0.25
                  10                                                                                                                               the value for any
                                                                                                       -0.50
                      1
                                                                                                                                                   observation, just
                       -1.0          -0.5      0.0        0.5       1.0                                            19.5   20.0     20.5    21.0    hold your cursor
                                                                                                                                                   21.5
                                             Residual                                                                         Fitted Value
                                                                                                                                                    over that point
                                                Histogram                                                                  Versus Order
                  4
                                                                                                       0.50
                  3                                                                                    0.25
      Frequency




                                                                                            Residual




                  2                                                                                    0.00

                                                                                                       -0.25
                  1
                                                                                                       -0.50
                  0
                              -0.6    -0.4    -0.2     0.0    0.2    0.4       0.6                             2      4    6   8    10   12       14   16
                                                     Residual                                                             Observation Order



                                                                                           Multiple Regression                                              UNCLASSIFIED / FOUO 29
UNCLASSIFIED / FOUO




   How to Address Multicollinearity
         Eliminate one or more input variables
               We’ll look at a technique called Best Subsets
                Regression
         Collect additional data
         Use process knowledge to determine the principal
          relationship
         Use DOE to further assess the multicollinearity
         If neither are significant then eliminate both from the
          analysis


                                   Multiple Regression      UNCLASSIFIED / FOUO 30
UNCLASSIFIED / FOUO




   Best Subsets Regression
         Rather than relying on the p-values alone, the
          computer looks at all possible combinations of
          variables and prints the resulting model
          characteristics
         Statistics like adjusted R-Sq and MSError will improve
          as important model terms are added, then worsen as
          “junk” terms are added to the model




                                Multiple Regression    UNCLASSIFIED / FOUO 31
UNCLASSIFIED / FOUO




   Best Subsets Regression Considerations
         Objective: We want to select a model with predictive
          accuracy and minimum multicollinearity
         Seek compromise between:
               Overfitting (including model terms with only
                marginal, or no, contribution)
               Underfitting (ignoring or deleting relatively
                important model terms)
         What are some problems with overfitting?
                                                           overfit        underfit
         What are some problems with underfitting?


                                    Multiple Regression         UNCLASSIFIED / FOUO 32
UNCLASSIFIED / FOUO




   Best Subsets Regression
    Evaluating Candidate Models
          Four things to look at when evaluating candidate models:
           1.    R2 (large R2 is desired, although R2 increases as we add more
                 predictors to the model, so this should only be used for
                 comparing models with the same number of terms)
           2.    Adjusted R2 (large is desired)
           3.    Mallows Cp statistic (small Cp desired, close to the number of
                 terms in the model)
           4.    s (the estimate of the standard deviation around the regression)
     Generally, the best three models are selected and checked for
    significance of all factors and residual assumptions




                                        Multiple Regression           UNCLASSIFIED / FOUO 33
UNCLASSIFIED / FOUO




   More on the Mallows C-p Statistic
         In practice, the minimum number of parameters needed in
          the model is when the Mallows’ C-p statistic is a minimum

         Rule of Thumb:
               We want C-p  number of input variables




                                    Multiple Regression   UNCLASSIFIED / FOUO 34
UNCLASSIFIED / FOUO




   Best Subsets Regression
                      Minitab data set: Production Plant

      Select Stat>
      Regression>
      Best Subsets




                       Multiple Regression                 UNCLASSIFIED / FOUO 35
UNCLASSIFIED / FOUO




   Best Subsets Regression (Cont.)



       Enter Response variable

       Enter Predictor variables
       (Input Variables)

       Click on OK to get analysis
       in Session Window




                                     Multiple Regression   UNCLASSIFIED / FOUO 36
UNCLASSIFIED / FOUO




   Best Subsets Regression (Cont.)
       Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

       Response is Amt of Ag

                                                                       T
                                                                       h
                                                                       i
                                                                       c
                                                                   S   k   W
                                                              T    p   n   a
                                                              e    e   e   t
                                   Mallows                    m    e   s   e
       Vars     R-Sq   R-Sq(adj)        Cp         S          p    d   s   r
          1     64.4        62.0       9.4   0.50387                   X
          1     62.3        59.8      10.7   0.51836               X
          2     80.0        77.2       1.5   0.39047               XX
                                                                               What Model(s)
          2     78.8        75.8       2.3   0.40200              X X           are the best
          3     80.6        76.1       3.2   0.39959          X X   X           candidates?
          3     80.3        75.8       3.4   0.40237            X X X
          4     80.9        74.5       5.0   0.41275          X X X X




                                             Multiple Regression                     UNCLASSIFIED / FOUO 37
UNCLASSIFIED / FOUO




   Best Subsets Regression (Cont.)
       Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

       Response is Amt of Ag

                                                                       T
                                                                       h
       R-Sq: Look for the highest value                                i
       when comparing models with the                                  c
                                                                   S   k   W
       same number of input variables
                                                              T    p   n   a
                                                              e    e   e   t
                                   Mallows                    m    e   s   e
       Vars     R-Sq   R-Sq(adj)        Cp         S          p    d   s   r
          1     64.4        62.0       9.4   0.50387                   X
          1     62.3        59.8      10.7   0.51836               X
          2     80.0        77.2       1.5   0.39047               XX
          2     78.8        75.8       2.3   0.40200              X X
          3     80.6        76.1       3.2   0.39959          X X   X
          3     80.3        75.8       3.4   0.40237            X X X
          4     80.9        74.5       5.0   0.41275          X X X X




                                             Multiple Regression                 UNCLASSIFIED / FOUO 38
UNCLASSIFIED / FOUO




   Best Subsets Regression (Cont.)
       Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

       Response is Amt of Ag

                                                                       T
               R-Sq (adj): Look for the                                h
                                                                       i
            highest value when comparing                               c
            models with different number                           S   k   W
                  of input variables                          T    p   n   a
                                                              e    e   e   t
                                   Mallows                    m    e   s   e
       Vars     R-Sq   R-Sq(adj)        Cp         S          p    d   s   r
          1     64.4        62.0       9.4   0.50387                   X
          1     62.3        59.8      10.7   0.51836               X
          2     80.0        77.2       1.5   0.39047               XX
          2     78.8        75.8       2.3   0.40200              X X
          3     80.6        76.1       3.2   0.39959          X X   X
          3     80.3        75.8       3.4   0.40237            X X X
          4     80.9        74.5       5.0   0.41275          X X X X




                                             Multiple Regression                 UNCLASSIFIED / FOUO 39
UNCLASSIFIED / FOUO




   Best Subsets Regression (Cont.)
       Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

       Response is Amt of Ag

                                                                       T
                Cp: Look for models where Cp is                        h
                small and close to the number of                       i
                                                                       c
                  input variables in the model                     S   k   W
                                                              T    p   n   a
                                                              e    e   e   t
                                   Mallows                    m    e   s   e
       Vars     R-Sq   R-Sq(adj)        Cp         S          p    d   s   r
          1     64.4        62.0       9.4   0.50387                   X
          1     62.3        59.8      10.7   0.51836               X
          2     80.0        77.2       1.5   0.39047               XX
          2     78.8        75.8       2.3   0.40200              X X
          3     80.6        76.1       3.2   0.39959          X X   X
          3     80.3        75.8       3.4   0.40237            X X X
          4     80.9        74.5       5.0   0.41275          X X X X




                                             Multiple Regression                 UNCLASSIFIED / FOUO 40
UNCLASSIFIED / FOUO




   Best Subsets Regression (Cont.)
       Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water

       Response is Amt of Ag

                                                                       T
                                                                       h
            S: We want S, the estimate of                              i
            the standard deviation about                               c
            the regression, to be as small                         S   k   W
                     as possible                              T    p   n   a
                                                              e    e   e   t
                                   Mallows                    m    e   s   e
       Vars     R-Sq   R-Sq(adj)        Cp         S          p    d   s   r
          1     64.4        62.0       9.4   0.50387                   X
          1     62.3        59.8      10.7   0.51836               X
          2     80.0        77.2       1.5   0.39047               XX
          2     78.8        75.8       2.3   0.40200              X X
          3     80.6        76.1       3.2   0.39959          X X   X
          3     80.3        75.8       3.4   0.40237            X X X
          4     80.9        74.5       5.0   0.41275          X X X X




                                             Multiple Regression                 UNCLASSIFIED / FOUO 41
UNCLASSIFIED / FOUO




   Once the Candidate Models Are Identified
         Evaluate the candidate models under a “microscope”
               Outliers
               High leverage
               Influential observations
               Residuals
               Prediction quality
         Once a model has been selected, find the new
          regression equation
         Test its predictive capability for observations NOT
          originally used in the modeling

                                     Multiple Regression   UNCLASSIFIED / FOUO 42
UNCLASSIFIED / FOUO




     Regression with Reduced Model
  We select the best model with two variables, Speed & Water,
  and run Minitab again to obtain the new regression equation:



     Select Stat>
     Regression>
     Regression




                                 Multiple Regression             UNCLASSIFIED / FOUO 43
UNCLASSIFIED / FOUO




  Regression with Reduced Model (Cont.)


  Enter Amt of Ag as the
  Response

  Enter only Speed and Water
  as Predictors

  Click on OK to get analysis
  in Session Window




                                Multiple Regression   UNCLASSIFIED / FOUO 44
UNCLASSIFIED / FOUO




  Regression with Reduced Model (Cont.)
                  Session window of Minitab yields the following regression
                  equation for the reduced model:
                         Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
                         Predictor          Coef       SE Coef               T          P
                         Constant          9.919         1.694            5.86      0.000
                         Speed           0.35689       0.08544            4.18      0.001
                         Water           0.04253       0.01206            3.53      0.003
                         S = 0.3905        R-Sq = 80.0%          R-Sq(adj) = 77.2%



                …to compare with the previous model:
                      Amt of Ag = 5.7 - 0.0156 Temp. + 0.239 Speed
                      + 0.44 Thickness + 0.0449 Water
                      Predictor          Coef      SE Coef                T          P
                      Constant           5.72        10.83             0.53      0.607
                      H20 Temp       -0.01558      0.02616            -0.60      0.563
                      Speed            0.2393       0.2644             0.90      0.383
                      Thick.            0.443        1.033             0.43      0.675
                      Water           0.04495      0.01481             3.04      0.010
                      S = 0.4127        R-Sq = 80.9%         R-Sq(adj) = 74.5%


                                                    Multiple Regression                     UNCLASSIFIED / FOUO 45
UNCLASSIFIED / FOUO




Unusual Observations

           Session window of Minitab also gives us the following output:

        Unusual Observations
        Obs      Speed   Amt of A           Fit                SE Fit   Residual     St Resid
          3       11.5    21.0000       20.3784                0.2477     0.6216         2.06R

        R denotes an observation with a large standardized residual




           An unusual observation means a large standard residual


                             Let’s see what would happen if we
                               eliminated such an observation
                                   from our collected data!



                                         Multiple Regression                       UNCLASSIFIED / FOUO 46
UNCLASSIFIED / FOUO




Impact of the Unusual Observation
               Without the Unusual Observation, the Session window of Minitab
               yields the following regression equation:
               Amt of Ag = 8.61 + 0.237 Speed + 0.0577 Water
               Predictor        Coef     SE Coef                  T       P
               Constant        8.610       1.567               5.49   0.000
               Speed         0.23698     0.08960               2.64   0.020
               Water         0.05775     0.01226               4.71   0.000
                                                                            R-Sq goes up a little
               S = 0.3383      R-Sq = 85.0%          R-Sq(adj) = 82.7%
                                                                          because we’ve gotten rid
                                                                           of “noise” in the model
               …to compare with the regression equation of our
               previous reduced model
               Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
               Predictor        Coef     SE Coef                  T       P
               Constant        9.919       1.694               5.86   0.000
               Speed         0.35689     0.08544               4.18   0.001
               Water         0.04253     0.01206               3.53   0.003
               S = 0.3905      R-Sq = 80.0%          R-Sq(adj) = 77.2%

                                         Multiple Regression                     UNCLASSIFIED / FOUO 47
UNCLASSIFIED / FOUO




   Takeaways
      Regression   analysis can be used with historical data as well
         data from designed experiments to build prediction models
      Care       must be exercised when using historical data
             Correlation does not imply a cause and effect relationship
             There may be serious problems with multicollinearity and
              high leverage observations
      There   are several diagnostic tools available to evaluate
         regression models:
             Fit: R2, adjusted R2, Cp, S
             Unusual observations: residual plots, leverage, CooksD
             Multicollinearity: VIFs (Variance Inflation Factors)

                                       Multiple Regression           UNCLASSIFIED / FOUO 48
UNCLASSIFIED / FOUO




   Considerations in Regression
         Set goals before doing the analysis (what do you want to learn,
          how well do you need to predict, etc.).
         Gather enough observations to adequately measure error and
          check the model assumptions.
         Make sure that the sample of data is representative of the
          population.
         Excessive measurement error of the inputs (Xs) creates
          uncertainty in the estimated coefficients, predictions, etc.
         Be sure to collect data on all potentially important explanatory
          variables.



                                    Multiple Regression           UNCLASSIFIED / FOUO 49
UNCLASSIFIED / FOUO




   Regression Checklist
        Scatterplots (Y vs. X)
        Histograms and/or Boxplots of Ys and Xs
        Coefficients
        Significance (p < .05 - .10)
        R2 and adjusted R2
        S
        Residuals (no obvious pattern)
        Unusual Y values (standardized residuals > 2)
        Unusual X values (leverage > 2p/n)

        Overfitting vs. underfitting (C-p    number of input variables in model)
        Multicollinearity (VIF > 5-10)

                                             Multiple Regression         UNCLASSIFIED / FOUO 50
UNCLASSIFIED / FOUO




        What other comments or questions
                  do you have?




                                    UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO




   References
      Neter,    Wasserman, and Kutner, Applied Linear Regression Models, Irwin, 1989
      Draper     and Smith, Applied Regression Analysis, Wiley, 1981
      Schulman,        Robert S., Statistics in Plain English, Chapman and Hall, 1992.
      Gunst     and Mason, Regression Analysis and its Application, Marcel Dekker, 1980
      Myers, Raymond H., Classical and Modern Regression with Applications,
        Duxbury, 1990
      Dielman,        Applied Regression Analysis for Business and Economics, Duxbury,
        1991
      Hosmer         and Lemeshow, Applied Logistic Regression, Wiley, 1989
      Iglewicz       and Hoaglin, How to Detect and Handle Outliers, ASQ Press
      Crocker,       Douglas C., How to use Regression Analysis in Quality Control, ASQ
        Press




                                              Multiple Regression               UNCLASSIFIED / FOUO 52
UNCLASSIFIED / FOUO

   UNCLASSIFIED / FOUO




                          National Guard
                         Black Belt Training
                                  APPENDIX
                          Additional Exercises
                              Anthony’s Pizza
                              Customer Satisfaction
                              A Study of Supervisor
                               Performance
                                                       UNCLASSIFIED / FOUO

                                                           UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO


   Additional Practice Example:
   Anthony’s Pizza
      We      have received Voice of the Customer feedback
         telling us that customers are dissatisfied if we cannot
         accurately predict the time of their pizza delivery when
         it is beyond the 30 minute target
      We    would like to develop a model so that when the
         customer calls, we can accurately predict delivery time




                                  Multiple Regression   UNCLASSIFIED / FOUO 54
UNCLASSIFIED / FOUO


   Additional Practice Example:
   Six Sigma Pizza
      Our       Minitab data can be found in the file Multiple
         Regression - Pizza.mpj
      Based    on the data that we have collected, we are going to
         study the effects of total pizzas ordered, defects, and
         incorrect order on delivery time




                                     Multiple Regression          UNCLASSIFIED / FOUO 55
UNCLASSIFIED / FOUO


   Additional Practice Exercise:
   Customer Satisfaction
      Bob   Black Belt would like to get a better understanding of the
         customer satisfaction data
      Use    the data provided in the Minitab file A-06 Customer
         Satisfaction Data.mtw to create a Regression Model to predict
         Overall Satisfaction




               Each row of data is a monthly average of how customers rated the
               services on a scale of 1-10. For example, in January, the average
                    of customer ratings for Staff Responsiveness was a 7.9.

                                          Multiple Regression              UNCLASSIFIED / FOUO 56
UNCLASSIFIED / FOUO


   Additional Practice Exercise:
   Customer Satisfaction (Cont.)
      Consider    Staff Responsiveness, Check-out Speed,
         Frequent Guest Program, and Problems Resolved as
         possible inputs that could be used to predict Overall
         Satisfaction.
              First, study correlation with a Matrix Plot and Correlation
               Table
              Next, create the initial Regression Model
              Find the best combination of inputs with Best Subsets
              Finally, run the reduced Regression Model




                                     Multiple Regression       UNCLASSIFIED / FOUO 57
UNCLASSIFIED / FOUO


   Additional Practice Exercise:
   A Study of Supervisor Performance
         A recent survey of clerical employees in a large financial organization
          included questions related to employee satisfaction with their
          supervisors. The company was interested in any relationships between
          specific supervisor characteristics and overall satisfaction with
          supervisors as perceived by the employees,
             Y = Overall rating of the job being done by the supervisor
             X1 = Handles employee complaints
             X2 = Does not allow special privileges
             X3 = Provides opportunity to learn new things
             X4 = Raises based on performance
             X5 = Too critical of poor performance
               X6 = Rate of advancing to better jobs (employee’s perception
                of their own advancement rate)

    Source: Regression Analysis by Example, Chatterjee and Price



                                                                   Multiple Regression   UNCLASSIFIED / FOUO 58
UNCLASSIFIED / FOUO


   Additional Practice Exercise:
   A Study of Supervisor Performance
      The     survey responses were on a scale of 1-5
      For  purposes of analysis, a score of 1 or 2 was considered
        “favorable”, while a score of 3, 4, or 5 was considered “unfavorable”
      Data  was collected from 30 departments, selected randomly form
        the organization. Each department had approximately 35 employees
        with one supervisor
      For  each department, the data was aggregated and the data
        recorded was the percent favorable for each item
      Data     file is A-06 Attitude.mtw
      Questions:
          Can  we predict the overall supervisor rating using this data?
          What variable(s) have the strongest correlation with the supervisor rating?
          Are there any unusual observations?
          Comments on the data?

                                         Multiple Regression               UNCLASSIFIED / FOUO 59

Más contenido relacionado

La actualidad más candente

NG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of ExperimentsNG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of ExperimentsLeanleaders.org
 
NG BB 36 Simple Linear Regression
NG BB 36 Simple Linear RegressionNG BB 36 Simple Linear Regression
NG BB 36 Simple Linear RegressionLeanleaders.org
 
NG BB 38 ANALYZE Tollgate
NG BB 38 ANALYZE TollgateNG BB 38 ANALYZE Tollgate
NG BB 38 ANALYZE TollgateLeanleaders.org
 
NG BB 02 Table of Contents
NG BB 02 Table of ContentsNG BB 02 Table of Contents
NG BB 02 Table of ContentsLeanleaders.org
 
NG BB 22 Process Measurement
NG BB 22 Process MeasurementNG BB 22 Process Measurement
NG BB 22 Process MeasurementLeanleaders.org
 
NG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE RoadmapNG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE RoadmapLeanleaders.org
 
NG BB 20 Data Collection
NG BB 20 Data CollectionNG BB 20 Data Collection
NG BB 20 Data CollectionLeanleaders.org
 
NG BB 18 Theory of Constraints
NG BB 18 Theory of ConstraintsNG BB 18 Theory of Constraints
NG BB 18 Theory of ConstraintsLeanleaders.org
 
NG BB 05 Roles and Responsibilities
NG BB 05 Roles and ResponsibilitiesNG BB 05 Roles and Responsibilities
NG BB 05 Roles and ResponsibilitiesLeanleaders.org
 
NG BB 06 Project Charter
NG BB 06 Project CharterNG BB 06 Project Charter
NG BB 06 Project CharterLeanleaders.org
 
NG BB 19 Document and Analyze the Process
NG BB 19 Document and Analyze the ProcessNG BB 19 Document and Analyze the Process
NG BB 19 Document and Analyze the ProcessLeanleaders.org
 
NG BB 46 Mistake Proofing
NG BB 46 Mistake ProofingNG BB 46 Mistake Proofing
NG BB 46 Mistake ProofingLeanleaders.org
 
NG BB 21 Intro to Minitab
NG BB 21 Intro to MinitabNG BB 21 Intro to Minitab
NG BB 21 Intro to MinitabLeanleaders.org
 
NG BB 07 Multi-Generation Project Planning
NG BB 07 Multi-Generation Project PlanningNG BB 07 Multi-Generation Project Planning
NG BB 07 Multi-Generation Project PlanningLeanleaders.org
 
NG BB 08 Change Management
NG BB 08 Change ManagementNG BB 08 Change Management
NG BB 08 Change ManagementLeanleaders.org
 

La actualidad más candente (17)

NG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of ExperimentsNG BB 47 Basic Design of Experiments
NG BB 47 Basic Design of Experiments
 
NG BB 36 Simple Linear Regression
NG BB 36 Simple Linear RegressionNG BB 36 Simple Linear Regression
NG BB 36 Simple Linear Regression
 
NG BB 38 ANALYZE Tollgate
NG BB 38 ANALYZE TollgateNG BB 38 ANALYZE Tollgate
NG BB 38 ANALYZE Tollgate
 
NG BB 02 Table of Contents
NG BB 02 Table of ContentsNG BB 02 Table of Contents
NG BB 02 Table of Contents
 
NG BB 22 Process Measurement
NG BB 22 Process MeasurementNG BB 22 Process Measurement
NG BB 22 Process Measurement
 
NG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE RoadmapNG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE Roadmap
 
NG BB 20 Data Collection
NG BB 20 Data CollectionNG BB 20 Data Collection
NG BB 20 Data Collection
 
NG BB 18 Theory of Constraints
NG BB 18 Theory of ConstraintsNG BB 18 Theory of Constraints
NG BB 18 Theory of Constraints
 
NG BB 05 Roles and Responsibilities
NG BB 05 Roles and ResponsibilitiesNG BB 05 Roles and Responsibilities
NG BB 05 Roles and Responsibilities
 
NG BB 06 Project Charter
NG BB 06 Project CharterNG BB 06 Project Charter
NG BB 06 Project Charter
 
NG BB 19 Document and Analyze the Process
NG BB 19 Document and Analyze the ProcessNG BB 19 Document and Analyze the Process
NG BB 19 Document and Analyze the Process
 
NG BB 46 Mistake Proofing
NG BB 46 Mistake ProofingNG BB 46 Mistake Proofing
NG BB 46 Mistake Proofing
 
NG BB 21 Intro to Minitab
NG BB 21 Intro to MinitabNG BB 21 Intro to Minitab
NG BB 21 Intro to Minitab
 
NG BB 07 Multi-Generation Project Planning
NG BB 07 Multi-Generation Project PlanningNG BB 07 Multi-Generation Project Planning
NG BB 07 Multi-Generation Project Planning
 
NG BB 11 Power Steering
NG BB 11 Power SteeringNG BB 11 Power Steering
NG BB 11 Power Steering
 
NG BB 17 Takt Time
NG BB 17 Takt TimeNG BB 17 Takt Time
NG BB 17 Takt Time
 
NG BB 08 Change Management
NG BB 08 Change ManagementNG BB 08 Change Management
NG BB 08 Change Management
 

Destacado

NG BB 28 MEASURE Tollgate
NG BB 28 MEASURE TollgateNG BB 28 MEASURE Tollgate
NG BB 28 MEASURE TollgateLeanleaders.org
 
NG BB 51 IMPROVE Tollgate
NG BB 51 IMPROVE TollgateNG BB 51 IMPROVE Tollgate
NG BB 51 IMPROVE TollgateLeanleaders.org
 
Variation and mistake proofing
Variation and mistake proofingVariation and mistake proofing
Variation and mistake proofingLeanleaders.org
 
NG BB 32 Failure Modes and Effects Analysis
NG BB 32 Failure Modes and Effects AnalysisNG BB 32 Failure Modes and Effects Analysis
NG BB 32 Failure Modes and Effects AnalysisLeanleaders.org
 
NG BB 54 Sustain the Gain
NG BB 54 Sustain the GainNG BB 54 Sustain the Gain
NG BB 54 Sustain the GainLeanleaders.org
 
NG BB 55 CONTROL Tollgate
NG BB 55 CONTROL TollgateNG BB 55 CONTROL Tollgate
NG BB 55 CONTROL TollgateLeanleaders.org
 
NG BB 27 Process Capability
NG BB 27 Process CapabilityNG BB 27 Process Capability
NG BB 27 Process CapabilityLeanleaders.org
 
NG BB 53 Process Control [Compatibility Mode]
NG BB 53 Process Control [Compatibility Mode]NG BB 53 Process Control [Compatibility Mode]
NG BB 53 Process Control [Compatibility Mode]Leanleaders.org
 
NG BB 42 Visual Management
NG BB 42 Visual ManagementNG BB 42 Visual Management
NG BB 42 Visual ManagementLeanleaders.org
 
NG BB 52 CONTROL Roadmap
NG BB 52 CONTROL RoadmapNG BB 52 CONTROL Roadmap
NG BB 52 CONTROL RoadmapLeanleaders.org
 
NG BB 40 Solution Selection
NG BB 40 Solution SelectionNG BB 40 Solution Selection
NG BB 40 Solution SelectionLeanleaders.org
 
NG BB 13 Voice of Customer
NG BB 13 Voice of CustomerNG BB 13 Voice of Customer
NG BB 13 Voice of CustomerLeanleaders.org
 
NG BB 25 Measurement System Analysis - Attribute
NG BB 25 Measurement System Analysis - AttributeNG BB 25 Measurement System Analysis - Attribute
NG BB 25 Measurement System Analysis - AttributeLeanleaders.org
 
NG BB 12 High-Level Process Map
NG BB 12 High-Level Process MapNG BB 12 High-Level Process Map
NG BB 12 High-Level Process MapLeanleaders.org
 

Destacado (14)

NG BB 28 MEASURE Tollgate
NG BB 28 MEASURE TollgateNG BB 28 MEASURE Tollgate
NG BB 28 MEASURE Tollgate
 
NG BB 51 IMPROVE Tollgate
NG BB 51 IMPROVE TollgateNG BB 51 IMPROVE Tollgate
NG BB 51 IMPROVE Tollgate
 
Variation and mistake proofing
Variation and mistake proofingVariation and mistake proofing
Variation and mistake proofing
 
NG BB 32 Failure Modes and Effects Analysis
NG BB 32 Failure Modes and Effects AnalysisNG BB 32 Failure Modes and Effects Analysis
NG BB 32 Failure Modes and Effects Analysis
 
NG BB 54 Sustain the Gain
NG BB 54 Sustain the GainNG BB 54 Sustain the Gain
NG BB 54 Sustain the Gain
 
NG BB 55 CONTROL Tollgate
NG BB 55 CONTROL TollgateNG BB 55 CONTROL Tollgate
NG BB 55 CONTROL Tollgate
 
NG BB 27 Process Capability
NG BB 27 Process CapabilityNG BB 27 Process Capability
NG BB 27 Process Capability
 
NG BB 53 Process Control [Compatibility Mode]
NG BB 53 Process Control [Compatibility Mode]NG BB 53 Process Control [Compatibility Mode]
NG BB 53 Process Control [Compatibility Mode]
 
NG BB 42 Visual Management
NG BB 42 Visual ManagementNG BB 42 Visual Management
NG BB 42 Visual Management
 
NG BB 52 CONTROL Roadmap
NG BB 52 CONTROL RoadmapNG BB 52 CONTROL Roadmap
NG BB 52 CONTROL Roadmap
 
NG BB 40 Solution Selection
NG BB 40 Solution SelectionNG BB 40 Solution Selection
NG BB 40 Solution Selection
 
NG BB 13 Voice of Customer
NG BB 13 Voice of CustomerNG BB 13 Voice of Customer
NG BB 13 Voice of Customer
 
NG BB 25 Measurement System Analysis - Attribute
NG BB 25 Measurement System Analysis - AttributeNG BB 25 Measurement System Analysis - Attribute
NG BB 25 Measurement System Analysis - Attribute
 
NG BB 12 High-Level Process Map
NG BB 12 High-Level Process MapNG BB 12 High-Level Process Map
NG BB 12 High-Level Process Map
 

Similar a NG BB 37 Multiple Regression

NG BB 36 Simple Linear Regression
NG BB 36 Simple Linear RegressionNG BB 36 Simple Linear Regression
NG BB 36 Simple Linear RegressionLeanleaders.org
 
NG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) MatrixNG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) MatrixLeanleaders.org
 
NG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) MatrixNG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) MatrixLeanleaders.org
 
NG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing BasicsNG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing BasicsLeanleaders.org
 
NG BB 23 Measurement System Analysis - Introduction
NG BB 23 Measurement System Analysis - IntroductionNG BB 23 Measurement System Analysis - Introduction
NG BB 23 Measurement System Analysis - IntroductionLeanleaders.org
 
NG BB 49 Risk Assessment
NG BB 49 Risk AssessmentNG BB 49 Risk Assessment
NG BB 49 Risk AssessmentLeanleaders.org
 
NG BB 45 Quick Change Over
NG BB 45 Quick Change OverNG BB 45 Quick Change Over
NG BB 45 Quick Change OverLeanleaders.org
 
NG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE RoadmapNG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE RoadmapLeanleaders.org
 
NG BB 15 MEASURE Roadmap
NG BB 15 MEASURE RoadmapNG BB 15 MEASURE Roadmap
NG BB 15 MEASURE RoadmapLeanleaders.org
 
NG BB 27 Process Capability
NG BB 27 Process CapabilityNG BB 27 Process Capability
NG BB 27 Process CapabilityLeanleaders.org
 
NG BB 24 Measurement System Analysis - Continuous
NG BB 24 Measurement System Analysis - ContinuousNG BB 24 Measurement System Analysis - Continuous
NG BB 24 Measurement System Analysis - ContinuousLeanleaders.org
 

Similar a NG BB 37 Multiple Regression (15)

NG BB 36 Simple Linear Regression
NG BB 36 Simple Linear RegressionNG BB 36 Simple Linear Regression
NG BB 36 Simple Linear Regression
 
NG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) MatrixNG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) Matrix
 
NG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) MatrixNG BB 31 Cause and Effect (XY) Matrix
NG BB 31 Cause and Effect (XY) Matrix
 
NG BB 30 Basic Tools
NG BB 30 Basic ToolsNG BB 30 Basic Tools
NG BB 30 Basic Tools
 
NG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing BasicsNG BB 33 Hypothesis Testing Basics
NG BB 33 Hypothesis Testing Basics
 
NG BB 23 Measurement System Analysis - Introduction
NG BB 23 Measurement System Analysis - IntroductionNG BB 23 Measurement System Analysis - Introduction
NG BB 23 Measurement System Analysis - Introduction
 
NG BB 49 Risk Assessment
NG BB 49 Risk AssessmentNG BB 49 Risk Assessment
NG BB 49 Risk Assessment
 
NG BB 45 Quick Change Over
NG BB 45 Quick Change OverNG BB 45 Quick Change Over
NG BB 45 Quick Change Over
 
NG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE RoadmapNG BB 39 IMPROVE Roadmap
NG BB 39 IMPROVE Roadmap
 
NG BB 15 MEASURE Roadmap
NG BB 15 MEASURE RoadmapNG BB 15 MEASURE Roadmap
NG BB 15 MEASURE Roadmap
 
NG BB 11 Power Steering
NG BB 11 Power SteeringNG BB 11 Power Steering
NG BB 11 Power Steering
 
NG BB 04 DEFINE Roadmap
NG BB 04 DEFINE RoadmapNG BB 04 DEFINE Roadmap
NG BB 04 DEFINE Roadmap
 
NG BB 27 Process Capability
NG BB 27 Process CapabilityNG BB 27 Process Capability
NG BB 27 Process Capability
 
Design of experiments
Design of experimentsDesign of experiments
Design of experiments
 
NG BB 24 Measurement System Analysis - Continuous
NG BB 24 Measurement System Analysis - ContinuousNG BB 24 Measurement System Analysis - Continuous
NG BB 24 Measurement System Analysis - Continuous
 

Más de Leanleaders.org

Más de Leanleaders.org (20)

D11 Define Review
D11 Define ReviewD11 Define Review
D11 Define Review
 
Blankgage.MTW
Blankgage.MTWBlankgage.MTW
Blankgage.MTW
 
Chi-sq GOF Calculator.xls
Chi-sq GOF Calculator.xlsChi-sq GOF Calculator.xls
Chi-sq GOF Calculator.xls
 
D04 Why6Sigma
D04 Why6SigmaD04 Why6Sigma
D04 Why6Sigma
 
D10 Project Management
D10 Project ManagementD10 Project Management
D10 Project Management
 
Attrib R&R.xls
Attrib R&R.xlsAttrib R&R.xls
Attrib R&R.xls
 
Blank Logo LEAN template
Blank Logo LEAN templateBlank Logo LEAN template
Blank Logo LEAN template
 
D07 Project Charter
D07 Project CharterD07 Project Charter
D07 Project Charter
 
ANG_AFSO21_Awareness_Training_(DULUTH)
ANG_AFSO21_Awareness_Training_(DULUTH)ANG_AFSO21_Awareness_Training_(DULUTH)
ANG_AFSO21_Awareness_Training_(DULUTH)
 
Cause and Effect Tree.vst
Cause and Effect Tree.vstCause and Effect Tree.vst
Cause and Effect Tree.vst
 
LEAN template
LEAN templateLEAN template
LEAN template
 
I07 Simulation
I07 SimulationI07 Simulation
I07 Simulation
 
D01 Define Spacer
D01 Define SpacerD01 Define Spacer
D01 Define Spacer
 
Attribute Process Capability Calculator.xls
Attribute Process Capability Calculator.xlsAttribute Process Capability Calculator.xls
Attribute Process Capability Calculator.xls
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 
XY Matrix.xls
XY Matrix.xlsXY Matrix.xls
XY Matrix.xls
 
D06 Project Selection
D06 Project SelectionD06 Project Selection
D06 Project Selection
 
G04 Root Cause Relationships
G04 Root Cause RelationshipsG04 Root Cause Relationships
G04 Root Cause Relationships
 
15 Deliv template
15 Deliv template15 Deliv template
15 Deliv template
 
NG BB 37 Multiple Regression
NG BB 37 Multiple RegressionNG BB 37 Multiple Regression
NG BB 37 Multiple Regression
 

Último

DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 

Último (20)

INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 

NG BB 37 Multiple Regression

  • 1. UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO National Guard Black Belt Training Module 37 Multiple Regression UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO
  • 2. UNCLASSIFIED / FOUO CPI Roadmap – Analyze 8-STEP PROCESS 6. See 1.Validate 2. Identify 3. Set 4. Determine 5. Develop 7. Confirm 8. Standardize Counter- the Performance Improvement Root Counter- Results Successful Measures Problem Gaps Targets Cause Measures & Process Processes Through Define Measure Analyze Improve Control ACTIVITIES TOOLS • Value Stream Analysis • Identify Potential Root Causes • Process Constraint ID • Reduce List of Potential Root • Takt Time Analysis Causes • Cause and Effect Analysis • Brainstorming • Confirm Root Cause to Output • 5 Whys Relationship • Affinity Diagram • Estimate Impact of Root Causes • Pareto on Key Outputs • Cause and Effect Matrix • FMEA • Prioritize Root Causes • Hypothesis Tests • Complete Analyze Tollgate • ANOVA • Chi Square • Simple and Multiple Regression Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive. UNCLASSIFIED / FOUO
  • 3. UNCLASSIFIED / FOUO Learning Objectives  Understand how to identify correlation with multiple variables  Learn how to create a mathematical model for the effect of multiple inputs on an output variable  Understand and identify multicollinearity  Understand how to use best subsets to identify the best model  Examine unusual observations to learn more about the data Multiple Regression UNCLASSIFIED / FOUO 3
  • 4. UNCLASSIFIED / FOUO Multiple Regression  In Simple Linear Regression, we had: Y = f(X)  Y = B0 + B1X  In Multiple Linear Regression, we have:  Y = B0 + B1X1 + B2X2 + X3 B3X3 X1 X5  We’d like to identify which, if any, of the predictor variables are X2 X4 Y useful in predicting Y Multiple Regression UNCLASSIFIED / FOUO 4
  • 5. UNCLASSIFIED / FOUO When Should I Use Multiple Regression? Independent Variable (X) Continuous Attribute Dependent Variable (Y) Continuous Regression ANOVA Attribute Logistic Chi-Square (2) Regression Test The tool depends on the data type. Regression is typically used with a continuous input and a continuous response but may also be used with count or categorical inputs and outputs. Multiple Regression UNCLASSIFIED / FOUO 5
  • 6. UNCLASSIFIED / FOUO Basic Steps for Regression Modeling STEPS OBJECTIVES KEY QUESTIONS 1 Process Flowchart To identify KPIVs and Which KPIVs will significantly SIPOC KPOVs improve which KPOVs? 2 Does it look like there is Scatter Plot, To visualize the data Histogram C&E relationship? 3 How strong is the C&E To qualify the C&E relationship Correlation, Test (Strength, % Variability, P-value) relationship? Hypothesis 4 To quantify the C&E relationship What is the prediction Regression Analysis (Method of Least Squares) equation? 5 Is there anything suspicious To validate the model selected Residual Analysis with the model selected? KPIV = Key Process Input Variables KPOV = Key Process Output Variables Multiple Regression UNCLASSIFIED / FOUO 6
  • 7. UNCLASSIFIED / FOUO Example: Production Plant  A chemical engineer is investigating the amount of silver required in the high volume production of contact switches for a new Army radio. Although only a small amount of silver is deposited on the switches, a larger amount is wasted through a multiple step process. She has collected data and would like to develop a prediction model. A-06 Production Plant  Step 1: The variables identified as KPIVs are given below:  X1 = Average temperature of rinse bath (degrees C)  X2 = Speed of reel that feeds the switches through the line (inches/min)  X3 = Thickness of silver deposit (angstroms)  X4 = Water consumed (gallons per day) What questions  Y = Amount of silver consumed (pounds/day) would you ask about this data? Source: Applied Regression Analysis, Draper and Smith Multiple Regression UNCLASSIFIED / FOUO 7
  • 8. UNCLASSIFIED / FOUO Visualize the Data! Step 2: Visualize the Data Data file: A-06 Production Plant.mtw Select Graph>Matrix Plot Multiple Regression UNCLASSIFIED / FOUO 8
  • 9. UNCLASSIFIED / FOUO Step 2: Visualize the Data! Looking for relationships between variables... This dialog box comes up first Select Matrix of Plots – Simple Since we have only one (Y) variable and no groups Click on OK to go the next Dialog box Multiple Regression UNCLASSIFIED / FOUO 9
  • 10. UNCLASSIFIED / FOUO Step 2: Visualize the Data! Double click on all of the variables you want to include in the Matrix, to place them in the Graph variables box Select Matrix Options to move on to the next dialog box Multiple Regression UNCLASSIFIED / FOUO 10
  • 11. UNCLASSIFIED / FOUO Step 2: Visualize the Data! Select Lower left to place all the graph labels to the lower left of the boxes Click on OK here and on the previous dialog box to get the matrix Multiple Regression UNCLASSIFIED / FOUO 11
  • 12. UNCLASSIFIED / FOUO Correlation Table There appear to be some relationships between certain variables and the response. Matrix Plot of Temp, Speed, Thickness, Water, Amt of Ag Temp 12 10 Speed Is this 8 14.0 good or 13.5 bad? Response Thickness 13.0 Variable 170 (Y) 160 Water 150 21 20 Amt of Ag 19 55 60 65 8 10 12 13.0 13.5 14.0 150 160 170 Multiple Regression UNCLASSIFIED / FOUO 12
  • 13. UNCLASSIFIED / FOUO Quantify the Relationships Between Variables Step 3: Quantify the relationship Select Stat>Basic Statistics> Correlation Multiple Regression UNCLASSIFIED / FOUO 13
  • 14. UNCLASSIFIED / FOUO Correlation Matrix Evaluating coefficients of correlation among predictors... Double click on all of the variables you want to include, to place them in the Variables box Check to display p-values (default setting) Click on OK to get the Correlation Matrix in your Session Window Multiple Regression UNCLASSIFIED / FOUO 14
  • 15. UNCLASSIFIED / FOUO Correlation Matrix The TOP number in each pair is the Pearson Coefficient of Correlation, (r-Value) While the BOTTOM number is the p-Value Predictor variable pairwise correlations larger than .5-.7 are signs of trouble ... Multicollinearity. We will explain more shortly. Multiple Regression UNCLASSIFIED / FOUO 15
  • 16. UNCLASSIFIED / FOUO Finding the Regression Equation... Step 4: Develop a prediction model Select: Stat> Regression> Regression Multiple Regression UNCLASSIFIED / FOUO 16
  • 17. UNCLASSIFIED / FOUO Finding the Regression Equation... (Cont.) Double click on C5 Amt of AG and place it in the Response: variable box, then double click on all the variables you want to place in the Predictors: box. Select Options to go to next dialog box. Multiple Regression UNCLASSIFIED / FOUO 17
  • 18. UNCLASSIFIED / FOUO Finding the Regression Equation... (Cont.) In this dialog box, the only thing you have to do is check Variance inflation factors Click on OK here and on previous dialog box to get the regression analysis in your Session Window Multiple Regression UNCLASSIFIED / FOUO 18
  • 19. UNCLASSIFIED / FOUO Regression Equation Minitab displays the following regression equation: Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness + 0.0449 Water Predictor Coef SE Coef T P VIF Constant 5.72 10.83 0.53 0.607 Temp -0.01558 0.02616 -0.60 0.563 1.276 Speed 0.2393 0.2644 0.90 0.383 10.997 Thickness 0.443 1.033 0.43 0.675 11.671 Water 0.04495 0.01481 3.04 0.010 1.731 S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5% The P-values indicate whether a particular predictor is significant This new model R-Sq (adj) adjusts for degrees in presence of other explains 80.9% of of freedom due to variables predictors in the response variability that have no real value. It model should be used when comparing models Multiple Regression UNCLASSIFIED / FOUO 19
  • 20. UNCLASSIFIED / FOUO Interpreting P-values  The P columns give the significance level for each term in the model  Typically, if a P value is less than or equal to 0.05, the variable is considered significant (i.e., null hypothesis is rejected)  If a P value is greater than 0.10, the term is removed from the model. A practitioner might leave the term in the model, if the P value is within the gray region between these two probability levels Multiple Regression UNCLASSIFIED / FOUO 20
  • 21. UNCLASSIFIED / FOUO Regression Equation Regression output in Minitab’s Session Window Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness + 0.0449 Water Predictor Coef SE Coef T P VIF Constant 5.72 10.83 0.53 0.607 Variance Inflation Factor Temp -0.01558 0.02616 -0.60 0.563 1.276 Speed 0.2393 0.2644 0.90 0.383 10.997 Thickness 0.443 1.033 0.43 0.675 11.671 Water 0.04495 0.01481 3.04 0.010 1.731 S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5% High VIF values are signs of trouble (VIF > 10) Multiple Regression UNCLASSIFIED / FOUO 21
  • 22. UNCLASSIFIED / FOUO Problems with Several Predictor Variables  Sometimes the Xs are correlated (dependent). This condition is known as Multicollinearity  Multicollinearity can cause problems (sometimes severe)  Estimates of the coefficients are affected (unstable, inflated variances)  Difficulty isolating the effects of each X  Coefficients depend on which Xs are included in the model  High multicollinearity inflates the standard error estimates, which increases the P values  If case of extreme multicollinearity, Minitab will throw out one term and give you notice Multiple Regression UNCLASSIFIED / FOUO 22
  • 23. UNCLASSIFIED / FOUO Graphical Representation of Multicollinearity Variation Explained by X1 Total Variation in Y Variation Explained by X2 • Overlap represents correlation • X1 and X2 are both correlated with Y • X1 and X2 are highly correlated • If X1 is in the model, we don’t need X2, and vice versa Multiple Regression UNCLASSIFIED / FOUO 23
  • 24. UNCLASSIFIED / FOUO Assessing the Degree of Multicollinearity  We use a metric called Variance Inflation Factor (VIF): 1 VIF  2 Select 1  Ri Stat>Regression>Regression>Options> Display variance inflation factors Where:  Ri2 is the R2 value you get when you regress Xi against the other X’s  A large Ri2 suggests that a variable is redundant  Rule of Thumb:  Ri2 > 0.9 is a cause for concern (high degree of collinearity) (VIF > 10)  0.8 < Ri2 < 0.9 (moderate degree of collinearity) (VIF > 5)  For the Production Plant data, Minitab gives us: VIF Temp 1.276 Speed 10.997 Two VIF’s are a bit large, but in this case with a R-sq. Thickness 11.671 of 80.9%, some multicollinearity can be tolerated Water 1.731 Multiple Regression UNCLASSIFIED / FOUO 24
  • 25. UNCLASSIFIED / FOUO Some Cautions About the Coefficients  Remember the prediction equation obtained earlier: Amt of Ag  5.7  0.0156 Temp.  0.239 Speed  0.44 Thickness  0.0449 Water  Relative importance of predictors cannot be determined from the size of their coefficients:  The coefficients are scale dependent  The coefficients are influenced by correlation among the predictor variables  If a high degree of multicollinearity exists, even the signs of the coefficients may be misleading Multiple Regression UNCLASSIFIED / FOUO 25
  • 26. UNCLASSIFIED / FOUO Residual Analysis Step 5: Validate the selected model Select Stat> Regression> Regression Is there anything suspicious with this model? Multiple Regression UNCLASSIFIED / FOUO 26
  • 27. UNCLASSIFIED / FOUO Residual Analysis (Cont.) Double click on C5 Amt of AG and place it in the Response variable box, then double click on all the variables you want to place in the Predictors box Select Graphs to go to next dialog box Multiple Regression UNCLASSIFIED / FOUO 27
  • 28. UNCLASSIFIED / FOUO Residual Analysis (Cont.) Select Four in one to get all four Residual plots on one graph, or you can pick and choose the plots You want Click on OK here and on previous Dialog box to get Residual plots Multiple Regression UNCLASSIFIED / FOUO 28
  • 29. UNCLASSIFIED / FOUO Residual Analysis (Cont.) Not too bad overall… Residual Plots for Amt of Ag Normal Probability Plot Versus Fits 99 N 17 AD 0.249 0.50 90 P-Value 0.705 0.25 Residual Percent 50 0.00 If you want to see -0.25 10 the value for any -0.50 1 observation, just -1.0 -0.5 0.0 0.5 1.0 19.5 20.0 20.5 21.0 hold your cursor 21.5 Residual Fitted Value over that point Histogram Versus Order 4 0.50 3 0.25 Frequency Residual 2 0.00 -0.25 1 -0.50 0 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 2 4 6 8 10 12 14 16 Residual Observation Order Multiple Regression UNCLASSIFIED / FOUO 29
  • 30. UNCLASSIFIED / FOUO How to Address Multicollinearity  Eliminate one or more input variables  We’ll look at a technique called Best Subsets Regression  Collect additional data  Use process knowledge to determine the principal relationship  Use DOE to further assess the multicollinearity  If neither are significant then eliminate both from the analysis Multiple Regression UNCLASSIFIED / FOUO 30
  • 31. UNCLASSIFIED / FOUO Best Subsets Regression  Rather than relying on the p-values alone, the computer looks at all possible combinations of variables and prints the resulting model characteristics  Statistics like adjusted R-Sq and MSError will improve as important model terms are added, then worsen as “junk” terms are added to the model Multiple Regression UNCLASSIFIED / FOUO 31
  • 32. UNCLASSIFIED / FOUO Best Subsets Regression Considerations  Objective: We want to select a model with predictive accuracy and minimum multicollinearity  Seek compromise between:  Overfitting (including model terms with only marginal, or no, contribution)  Underfitting (ignoring or deleting relatively important model terms)  What are some problems with overfitting? overfit underfit  What are some problems with underfitting? Multiple Regression UNCLASSIFIED / FOUO 32
  • 33. UNCLASSIFIED / FOUO Best Subsets Regression Evaluating Candidate Models  Four things to look at when evaluating candidate models: 1. R2 (large R2 is desired, although R2 increases as we add more predictors to the model, so this should only be used for comparing models with the same number of terms) 2. Adjusted R2 (large is desired) 3. Mallows Cp statistic (small Cp desired, close to the number of terms in the model) 4. s (the estimate of the standard deviation around the regression)  Generally, the best three models are selected and checked for significance of all factors and residual assumptions Multiple Regression UNCLASSIFIED / FOUO 33
  • 34. UNCLASSIFIED / FOUO More on the Mallows C-p Statistic  In practice, the minimum number of parameters needed in the model is when the Mallows’ C-p statistic is a minimum  Rule of Thumb:  We want C-p  number of input variables Multiple Regression UNCLASSIFIED / FOUO 34
  • 35. UNCLASSIFIED / FOUO Best Subsets Regression Minitab data set: Production Plant Select Stat> Regression> Best Subsets Multiple Regression UNCLASSIFIED / FOUO 35
  • 36. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Enter Response variable Enter Predictor variables (Input Variables) Click on OK to get analysis in Session Window Multiple Regression UNCLASSIFIED / FOUO 36
  • 37. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T h i c S k W T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX What Model(s) 2 78.8 75.8 2.3 0.40200 X X are the best 3 80.6 76.1 3.2 0.39959 X X X candidates? 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 37
  • 38. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T h R-Sq: Look for the highest value i when comparing models with the c S k W same number of input variables T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 38
  • 39. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T R-Sq (adj): Look for the h i highest value when comparing c models with different number S k W of input variables T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 39
  • 40. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T Cp: Look for models where Cp is h small and close to the number of i c input variables in the model S k W T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 40
  • 41. UNCLASSIFIED / FOUO Best Subsets Regression (Cont.) Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water Response is Amt of Ag T h S: We want S, the estimate of i the standard deviation about c the regression, to be as small S k W as possible T p n a e e e t Mallows m e s e Vars R-Sq R-Sq(adj) Cp S p d s r 1 64.4 62.0 9.4 0.50387 X 1 62.3 59.8 10.7 0.51836 X 2 80.0 77.2 1.5 0.39047 XX 2 78.8 75.8 2.3 0.40200 X X 3 80.6 76.1 3.2 0.39959 X X X 3 80.3 75.8 3.4 0.40237 X X X 4 80.9 74.5 5.0 0.41275 X X X X Multiple Regression UNCLASSIFIED / FOUO 41
  • 42. UNCLASSIFIED / FOUO Once the Candidate Models Are Identified  Evaluate the candidate models under a “microscope”  Outliers  High leverage  Influential observations  Residuals  Prediction quality  Once a model has been selected, find the new regression equation  Test its predictive capability for observations NOT originally used in the modeling Multiple Regression UNCLASSIFIED / FOUO 42
  • 43. UNCLASSIFIED / FOUO Regression with Reduced Model We select the best model with two variables, Speed & Water, and run Minitab again to obtain the new regression equation: Select Stat> Regression> Regression Multiple Regression UNCLASSIFIED / FOUO 43
  • 44. UNCLASSIFIED / FOUO Regression with Reduced Model (Cont.) Enter Amt of Ag as the Response Enter only Speed and Water as Predictors Click on OK to get analysis in Session Window Multiple Regression UNCLASSIFIED / FOUO 44
  • 45. UNCLASSIFIED / FOUO Regression with Reduced Model (Cont.) Session window of Minitab yields the following regression equation for the reduced model: Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water Predictor Coef SE Coef T P Constant 9.919 1.694 5.86 0.000 Speed 0.35689 0.08544 4.18 0.001 Water 0.04253 0.01206 3.53 0.003 S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2% …to compare with the previous model: Amt of Ag = 5.7 - 0.0156 Temp. + 0.239 Speed + 0.44 Thickness + 0.0449 Water Predictor Coef SE Coef T P Constant 5.72 10.83 0.53 0.607 H20 Temp -0.01558 0.02616 -0.60 0.563 Speed 0.2393 0.2644 0.90 0.383 Thick. 0.443 1.033 0.43 0.675 Water 0.04495 0.01481 3.04 0.010 S = 0.4127 R-Sq = 80.9% R-Sq(adj) = 74.5% Multiple Regression UNCLASSIFIED / FOUO 45
  • 46. UNCLASSIFIED / FOUO Unusual Observations Session window of Minitab also gives us the following output: Unusual Observations Obs Speed Amt of A Fit SE Fit Residual St Resid 3 11.5 21.0000 20.3784 0.2477 0.6216 2.06R R denotes an observation with a large standardized residual An unusual observation means a large standard residual Let’s see what would happen if we eliminated such an observation from our collected data! Multiple Regression UNCLASSIFIED / FOUO 46
  • 47. UNCLASSIFIED / FOUO Impact of the Unusual Observation Without the Unusual Observation, the Session window of Minitab yields the following regression equation: Amt of Ag = 8.61 + 0.237 Speed + 0.0577 Water Predictor Coef SE Coef T P Constant 8.610 1.567 5.49 0.000 Speed 0.23698 0.08960 2.64 0.020 Water 0.05775 0.01226 4.71 0.000 R-Sq goes up a little S = 0.3383 R-Sq = 85.0% R-Sq(adj) = 82.7% because we’ve gotten rid of “noise” in the model …to compare with the regression equation of our previous reduced model Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water Predictor Coef SE Coef T P Constant 9.919 1.694 5.86 0.000 Speed 0.35689 0.08544 4.18 0.001 Water 0.04253 0.01206 3.53 0.003 S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2% Multiple Regression UNCLASSIFIED / FOUO 47
  • 48. UNCLASSIFIED / FOUO Takeaways  Regression analysis can be used with historical data as well data from designed experiments to build prediction models  Care must be exercised when using historical data  Correlation does not imply a cause and effect relationship  There may be serious problems with multicollinearity and high leverage observations  There are several diagnostic tools available to evaluate regression models:  Fit: R2, adjusted R2, Cp, S  Unusual observations: residual plots, leverage, CooksD  Multicollinearity: VIFs (Variance Inflation Factors) Multiple Regression UNCLASSIFIED / FOUO 48
  • 49. UNCLASSIFIED / FOUO Considerations in Regression  Set goals before doing the analysis (what do you want to learn, how well do you need to predict, etc.).  Gather enough observations to adequately measure error and check the model assumptions.  Make sure that the sample of data is representative of the population.  Excessive measurement error of the inputs (Xs) creates uncertainty in the estimated coefficients, predictions, etc.  Be sure to collect data on all potentially important explanatory variables. Multiple Regression UNCLASSIFIED / FOUO 49
  • 50. UNCLASSIFIED / FOUO Regression Checklist  Scatterplots (Y vs. X)  Histograms and/or Boxplots of Ys and Xs  Coefficients  Significance (p < .05 - .10)  R2 and adjusted R2  S  Residuals (no obvious pattern)  Unusual Y values (standardized residuals > 2)  Unusual X values (leverage > 2p/n)  Overfitting vs. underfitting (C-p  number of input variables in model)  Multicollinearity (VIF > 5-10) Multiple Regression UNCLASSIFIED / FOUO 50
  • 51. UNCLASSIFIED / FOUO What other comments or questions do you have? UNCLASSIFIED / FOUO
  • 52. UNCLASSIFIED / FOUO References  Neter, Wasserman, and Kutner, Applied Linear Regression Models, Irwin, 1989  Draper and Smith, Applied Regression Analysis, Wiley, 1981  Schulman, Robert S., Statistics in Plain English, Chapman and Hall, 1992.  Gunst and Mason, Regression Analysis and its Application, Marcel Dekker, 1980  Myers, Raymond H., Classical and Modern Regression with Applications, Duxbury, 1990  Dielman, Applied Regression Analysis for Business and Economics, Duxbury, 1991  Hosmer and Lemeshow, Applied Logistic Regression, Wiley, 1989  Iglewicz and Hoaglin, How to Detect and Handle Outliers, ASQ Press  Crocker, Douglas C., How to use Regression Analysis in Quality Control, ASQ Press Multiple Regression UNCLASSIFIED / FOUO 52
  • 53. UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO National Guard Black Belt Training APPENDIX Additional Exercises  Anthony’s Pizza  Customer Satisfaction  A Study of Supervisor Performance UNCLASSIFIED / FOUO UNCLASSIFIED / FOUO
  • 54. UNCLASSIFIED / FOUO Additional Practice Example: Anthony’s Pizza  We have received Voice of the Customer feedback telling us that customers are dissatisfied if we cannot accurately predict the time of their pizza delivery when it is beyond the 30 minute target  We would like to develop a model so that when the customer calls, we can accurately predict delivery time Multiple Regression UNCLASSIFIED / FOUO 54
  • 55. UNCLASSIFIED / FOUO Additional Practice Example: Six Sigma Pizza  Our Minitab data can be found in the file Multiple Regression - Pizza.mpj  Based on the data that we have collected, we are going to study the effects of total pizzas ordered, defects, and incorrect order on delivery time Multiple Regression UNCLASSIFIED / FOUO 55
  • 56. UNCLASSIFIED / FOUO Additional Practice Exercise: Customer Satisfaction  Bob Black Belt would like to get a better understanding of the customer satisfaction data  Use the data provided in the Minitab file A-06 Customer Satisfaction Data.mtw to create a Regression Model to predict Overall Satisfaction Each row of data is a monthly average of how customers rated the services on a scale of 1-10. For example, in January, the average of customer ratings for Staff Responsiveness was a 7.9. Multiple Regression UNCLASSIFIED / FOUO 56
  • 57. UNCLASSIFIED / FOUO Additional Practice Exercise: Customer Satisfaction (Cont.)  Consider Staff Responsiveness, Check-out Speed, Frequent Guest Program, and Problems Resolved as possible inputs that could be used to predict Overall Satisfaction.  First, study correlation with a Matrix Plot and Correlation Table  Next, create the initial Regression Model  Find the best combination of inputs with Best Subsets  Finally, run the reduced Regression Model Multiple Regression UNCLASSIFIED / FOUO 57
  • 58. UNCLASSIFIED / FOUO Additional Practice Exercise: A Study of Supervisor Performance  A recent survey of clerical employees in a large financial organization included questions related to employee satisfaction with their supervisors. The company was interested in any relationships between specific supervisor characteristics and overall satisfaction with supervisors as perceived by the employees,  Y = Overall rating of the job being done by the supervisor  X1 = Handles employee complaints  X2 = Does not allow special privileges  X3 = Provides opportunity to learn new things  X4 = Raises based on performance  X5 = Too critical of poor performance  X6 = Rate of advancing to better jobs (employee’s perception of their own advancement rate) Source: Regression Analysis by Example, Chatterjee and Price Multiple Regression UNCLASSIFIED / FOUO 58
  • 59. UNCLASSIFIED / FOUO Additional Practice Exercise: A Study of Supervisor Performance  The survey responses were on a scale of 1-5  For purposes of analysis, a score of 1 or 2 was considered “favorable”, while a score of 3, 4, or 5 was considered “unfavorable”  Data was collected from 30 departments, selected randomly form the organization. Each department had approximately 35 employees with one supervisor  For each department, the data was aggregated and the data recorded was the percent favorable for each item  Data file is A-06 Attitude.mtw  Questions:  Can we predict the overall supervisor rating using this data?  What variable(s) have the strongest correlation with the supervisor rating?  Are there any unusual observations?  Comments on the data? Multiple Regression UNCLASSIFIED / FOUO 59