This document provides an overview of data analysis techniques including analysis of variance (ANOVA), regression, correlation, and multivariate statistical analysis. It discusses understanding and interpreting ANOVA, regression, correlation matrices, and exploring factor analysis, multiple discriminant analysis, and cluster analysis. The document also provides examples of interpreting statistical output from ANOVA, regression, and correlation analysis.
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Mba2216 week 11 data analysis part 02
1. Data Analysis Part 2:
Variances, Regression, Correl
ation
MBA2216 BUSINESS RESEARCH PROJECT
by
Stephen Ong
Visiting Fellow, Birmingham City University, UK
Visiting Professor, Shenzhen University
2. 17. Understand the concept of analysis of variance
(ANOVA)
18. Interpret an ANOVA table
19. Apply and interpret simple bivariate correlations
22. Interpret a correlation matrix
23. Understand simple (bivariate) regression
24. Understand the least-squares estimation technique
25. Interpret regression output including the tests of
hypotheses tied to specific parameter coefficients
27. Understand what multivariate statistical analysis
involves and know the two types of multivariate
analysis
19–2
LEARNING OUTCOMES
3. 30. Interpret basic exploratory factor
analysis results
31. Know what multiple discriminant
analysis can be used to do
32. Understand how cluster analysis can
identify market segments
19–3
LEARNING OUTCOMES
4. Remember this,
Garbage in, garbage out!
If data is collected improperly, or coded
incorrectly, then the research results
are “garbage”.
6. Relationship Amongst Test, Analysis
of Variance, Analysis of Covariance, &
Regression
One Independent One or More
Metric Dependent Variable
t Test
Binary
Variable
One-Way Analysis
of Variance
One Factor
N-Way Analysis
of Variance
More than
One Factor
Analysis of
Variance
Categorical:
Factorial
Analysis of
Covariance
Categorical
and Interval
Regression
Interval
Independent Variables
7. The Z-Test for Comparing
Two Proportions
Z-Test for Differences of Proportions
Tests the hypothesis that proportions are
significantly different for two independent
samples or groups.
Requires a sample size greater than thirty.
The hypothesis is: Ho: π1 = π2
may be restated as: Ho: π1 - π2 = 0
8. The Z-Test for Comparing Two
Proportions
Z-Test statistic for differences in large
random samples:
21
2121
ppS
pp
Z
p1 = sample portion of successes in Group 1
p2 = sample portion of successes in Group 2
1 1) = hypothesized population proportion 1
minus hypothesized population proportion 2
Sp1-p2 = pooled estimate of the standard errors of
differences of proportions
9. The Z-Test for Comparing Two
Proportions
To calculate the standard error of the
differences in proportions:
21
11
21
nn
qpS pp
10. One-Way Analysis of
Variance (ANOVA)
Analysis of Variance (ANOVA)
An analysis involving the investigation of the
effects of one treatment variable on an
interval-scaled dependent variable.
A hypothesis-testing technique to determine
whether statistically significant differences in
means occur between two or more groups.
A method of comparing variances to make
inferences about the means.
The substantive hypothesis tested is:
At least one group mean is not equal to another
group mean.
11. Partitioning Variance in
ANOVA
Total Variability
Grand Mean
The mean of a variable over all observations.
SST = Total of (observed value-grand
mean)2
12. Partitioning Variance in ANOVA
Between-Groups Variance
The sum of differences between the group mean
and the grand mean summed over all groups for a
given set of observations.
SSB = Total of ngroup(Group Mean − Grand Mean)2
Within-Group Error or Variance
The sum of the differences between observed
values and the group mean for a given set of
observations
Also known as total error variance.
SSE = Total of (Observed Mean − Group Mean)2
13. The F-Test
F-Test
Used to determine whether there is more
variability in the scores of one sample than
in the scores of another sample.
Variance components are used to compute
F-ratios
SSE, SSB, SST
groupswithinVariance
groupsbetweenVariance
F
16. SPSS Windows
One-way ANOVA can be efficiently
performed using the program COMPARE
MEANS and then One-way ANOVA. To
select this procedure using SPSS for
Windows, click:
Analyze>Compare Means>One-Way ANOVA …
N-way analysis of variance and analysis of
covariance can be performed using
GENERAL LINEAR MODEL. To select this
procedure using SPSS for Windows, click:
Analyze>General Linear Model>Univariate …
17. SPSS Windows: One-Way
ANOVA
1. Select ANALYZE from the SPSS menu bar.
2. Click COMPARE MEANS and then ONE-WAY ANOVA.
3. Move “Sales [sales]” in to the DEPENDENT LIST box.
4. Move “In-Store Promotion[promotion]” to the FACTOR
box.
5. Click OPTIONS.
6. Click Descriptive.
7. Click CONTINUE.
8. Click OK.
18. SPSS Windows: Analysis of Covariance
1. Select ANALYZE from the SPSS menu bar.
2. Click GENERAL LINEAR MODEL and then UNIVARIATE.
3. Move “Sales [sales]” in to the DEPENDENT VARIABLE
box.
4. Move “In-Store Promotion[promotion]” to the FIXED
FACTOR(S) box. Then move “Coupon[coupon] also to
the FIXED FACTOR(S) box.
5. Move “Clientel[clientel] to the COVARIATE(S) box.
6. Click OK.
19. The Basics
Measures of Association
Refers to a number of bivariate statistical
techniques used to measure the strength of a
relationship between two variables.
The chi-square ( 2) test provides information
about whether two or more less-than interval
variables are interrelated.
Correlation analysis is most appropriate for
interval or ratio variables.
Regression can accommodate either less-
than interval or interval independent
variables, but the dependent variable must
be continuous.
21. Simple Correlation Coefficient
(continued)
Correlation coefficient
A statistical measure of the covariation, or
association, between two at-least interval
variables.
Covariance
Extent to which two variables are
associated systematically with each other.
n
i
n
i
n
i
ii
yxxy
YYiXXi
YYXX
rr
1 1
22
1
22. Simple Correlation Coefficient
Correlation coefficient (r)
Ranges from +1 to -1
Perfect positive linear relationship = +1
Perfect negative (inverse) linear relationship =
-1
No correlation = 0
Correlation coefficient for two
variables (X,Y)
24. Correlation, Covariance, and
Causation
When two variables covary (i.e. vary
systematically), they display
concomitant variation.
This systematic covariation does not in
and of itself establish causality.
e.g., Rooster’s crow and the rising of
the sun
Rooster does not cause the sun to rise.
25. Coefficient of Determination
Coefficient of Determination (R2)
A measure obtained by squaring the
correlation coefficient; the proportion of
the total variance of a variable accounted
for by another value of another variable.
Measures that part of the total variance of
Y that is accounted for by knowing the
value of X.
VarianceTotal
varianceExplained2
R
26. Correlation Matrix
Correlation matrix
The standard form for reporting correlation
coefficients for more than two variables.
Statistical Significance
The procedure for determining statistical
significance is the t-test of the significance
of a correlation coefficient.
28. Regression Analysis
Simple (Bivariate) Linear Regression
A measure of linear association that
investigates straight-line relationships
between a continuous dependent variable and
an independent variable that is usually
continuous, but can be a categorical dummy
variable.
The Regression Equation (Y = α + βX )
Y = the continuous dependent variable
X = the independent variable
α = the Y intercept (regression line intercepts
Y axis)
β = the slope of the coefficient (rise over run)
30. The Regression Equation
Parameter Estimate Choices
β is indicative of the strength and direction of the
relationship between the independent and
dependent variable.
α (Y intercept) is a fixed point that is considered
a constant (how much Y can exist without X)
Standardized Regression Coefficient (β)
Estimated coefficient of the strength of
relationship between the independent and
dependent variables.
Expressed on a standardized scale where higher
absolute values indicate stronger relationships
(range is from -1 to 1).
31. The Regression Equation (cont’d)
Parameter Estimate Choices
Raw regression estimates (b1)
Raw regression weights have the advantage of retaining
the scale metric—which is also their key disadvantage.
If the purpose of the regression analysis is forecasting,
then raw parameter estimates must be used.
This is another way of saying when the researcher is
interested only in prediction.
Standardized regression estimates (β)
Standardized regression estimates have the advantage
of a constant scale.
Standardized regression estimates should be used when
the researcher is testing explanatory hypotheses.
35. Ordinary Least-Squares
(OLS) Method of Regression
Analysis OLS
Guarantees that the resulting straight line will produce the
least possible total error in using X to predict Y.
Generates a straight line that minimizes the sum of
squared deviations of the actual values from this predicted
regression line.
No straight line can completely represent every dot in the
scatter diagram.
There will be a discrepancy between most of the actual
scores (each dot) and the predicted score .
Uses the criterion of attempting to make the least amount
of total error in prediction of Y from X.
37. Ordinary Least-Squares Method
of Regression Analysis (OLS)
(cont’d)
The equation means that the predicted value for any
value of X (Xi) is determined as a function of the
estimated slope coefficient, plus the estimated intercept
coefficient + some error.
40. Ordinary Least-Squares Method
of Regression Analysis (OLS)
(cont’d)
Statistical Significance Of Regression Model
ANOVA Table:
41. Ordinary Least-Squares Method
of Regression Analysis (OLS)
(cont’d)
R2
The proportion of variance in Y that is
explained by X (or vice versa)
A measure obtained by squaring the
correlation coefficient; that proportion of
the total variance of a variable that is
accounted for by knowing the value of
another variable.
875.0
40.882,3
49.398,32
R
44. Simple Regression and
Hypothesis Testing
The explanatory power of regression lies
in hypothesis testing. Regression is often
used to test relational hypotheses.
The outcome of the hypothesis test involves
two conditions that must both be satisfied:
The regression weight must be in the hypothesized
direction. Positive relationships require a positive
coefficient and negative relationships require a
negative coefficient.
The t-test associated with the regression weight
must be significant.
45. What is Multivariate Data
Analysis?
Research that involves three or more
variables, or that is concerned with underlying
dimensions among multiple variables, will
involve multivariate statistical analysis.
Methods analyze multiple variables or even
multiple sets of variables simultaneously.
Business problems involve multivariate data
analysis:
most employee motivation research
customer psychographic profiles
research that seeks to identify viable market segments
46. The “Variate” in Multivariate
Variate
A mathematical way in which a set of
variables can be represented with one
equation.
A linear combination of variables, each
contributing to the overall meaning of the
variate based upon an empirically derived
weight.
A function of the measured variables involved
in an analysis: Vk = f (X1, X2, . . . , Xm )
48. 24–48
Classifying Multivariate
Techniques
Dependence Techniques
Explain or predict one or more dependent
variables.
Needed when hypotheses involve
distinction between independent and
dependent variables.
Types:
Multiple regression analysis
Multiple discriminant analysis
Multivariate analysis of variance
Structural equations modeling
49. Classifying Multivariate
Techniques (cont’d)
Interdependence Techniques
Give meaning to a set of variables or seek
to group things together.
Used when researchers examine
questions that do not distinguish between
independent and dependent variables.
Types:
Factor analysis
Cluster analysis
Multidimensional scaling
50. Classifying Multivariate
Techniques (cont’d)
Influence of Measurement Scales
The nature of the measurement scales will
determine which multivariate technique is
appropriate for the data.
Selection of a multivariate technique
requires consideration of the types of
measures used for both independent and
dependent sets of variables.
Nominal and ordinal scales are nonmetric.
Interval and ratio scales are metric.
53. Analysis of Dependence
General Linear Model (GLM)
A way of explaining and predicting a dependent
variable based on fluctuations (variation) from its
mean due to changes in independent variables.
μ = a constant (overall mean of the dependent variable)
∆X and ∆F = changes due to main effect independent variables
(experimental variables) and blocking independent
variables (covariates or grouping variables)
∆ XF = represents the change due to the combination
(interaction effect) of those variables.
54. Interpreting Multiple Regression
Multiple Regression Analysis
An analysis of association in which the
effects of two or more independent variables
on a single, interval-scaled dependent
variable are investigated simultaneously.
inni eXbXbXbXbbY 3322110
• Dummy variable
The way a dichotomous (two group) independent
variable is represented in regression analysis by
assigning a 0 to one group and a 1 to the other.
55. Multiple Regression Analysis
A Simple Example
Assume that a toy manufacturer wishes to explain
store sales (dependent variable) using a sample
of stores from Canada and Europe.
Several hypotheses are offered:
H1: Competitor’s sales are related negatively to sales.
H2: Sales are higher in communities with a sales office
than
when no sales office is present.
H3: Grammar school enrollment in a community is
related
positively to sales.
56. Multiple Regression Analysis
(cont’d)
Statistical Results of the Multiple
Regression
Regression Equation:
Coefficient of multiple determination (R2) = 0.845
F-value= 14.6, p < 0.05
321 7362115387018102 XXXY ....
57. Multiple Regression Analysis
(cont’d)
Regression Coefficients in Multiple
Regression
Partial correlation
The correlation between two variables after taking
into account the fact that they are correlated with
other variables too.
R2 in Multiple Regression
The coefficient of multiple determination in
multiple regression indicates the percentage
of variation in Y explained by all independent
variables.
58. 24–58
Multiple Regression Analysis
(cont’d)
Statistical Significance in Multiple
Regression
F-test
Tests statistical significance by comparing the
variation explained by the regression equation
to the residual error variation.
Allows for testing of the relative magnitudes
of the sum of squares due to the regression
(SSR) and the error sum of squares (SSE).
MSE
MSR
knSSe
kSSr
F
1/
/
59. Multiple Regression Analysis
(cont’d)
Degrees of Freedom (d.f.)
k = number of independent variables
n = number of observations or
respondents
Calculating Degrees of Freedom (d.f.)
d.f. for the numerator = k
d.f. for the denominator = n - k - 1
62. ANOVA (n-way) and MANOVA
Multivariate Analysis of Variance
(MANOVA)
A multivariate technique that predicts
multiple continuous dependent variables
with multiple categorical independent
variables.
63. ANOVA (n-way) and MANOVA
(cont’d)
Interpreting N-way (Univariate) ANOVA
1. Examine overall model F-test result. If
significant, proceed.
2. Examine individual F-tests for individual
variables.
3. For each significant categorical independent
variable, interpret the effect by examining the
group means.
4. For each significant, continuous covariate,
interpret the parameter estimate (b).
5. For each significant interaction, interpret the
means for each combination.
64. Discriminant Analysis
A statistical technique for predicting the
probability that an object will belong in
one of two or more mutually exclusive
categories (dependent variable), based
on several independent variables.
To calculate discriminant scores, the linear
function used is:
niniii XbXbXbZ 2211
67. Factor Analysis
Statistically identifies a reduced number
of factors from a larger number of
measured variables.
Types:
Exploratory factor analysis (EFA)—performed
when the researcher is uncertain about how
many factors may exist among a set of
variables.
Confirmatory factor analysis (CFA)—
performed when the researcher has strong
theoretical expectations about the factor
structure before performing the analysis.
69. Factor Analysis (cont’d)
How Many Factors
Eigenvalues are a measure of how much
variance is explained by each factor.
Common rule:
Base the number of factors on the number of
eigenvalues greater than 1.0.
Factor Loading
Indicates how strongly a measured
variable is correlated with a factor.
70. Factor Analysis (cont’d)
Factor Rotation
A mathematical way of simplifying factor
analysis results to better identify which
variables “load on” which factors.
Most common procedure is varimax rotation.
Data Reduction Technique
Approaches that summarize the information
from many variables into a reduced set of
variates formed as linear combinations of
measured variables.
The rule of parsimony: an explanation
involving fewer components is better than
one involving many more.
71. Factor Analysis (cont’d)
Creating Composite Scales with Factor
Results
When a clear pattern of loadings exists,
the researcher may take a simpler
approach by summing the variables with
high loadings and creating a summated
scale.
Very low loadings suggest a variable does not
contribute much to the factor.
The reliability of each summated scale is tested
by computing a coefficient alpha estimate.
72. Factor Analysis (cont’d)
Communality
A measure of the percentage of a
variable’s variation that is explained by the
factors.
A relatively high communality indicates
that a variable has much in common with
the other variables taken as a group.
Communality for any variable is equal to
the sum of the squared loadings for that
variable.
73. Factor Analysis (cont’d)
Total Variance Explained
Squaring and totaling each loading factor;
dividing the total by the number of factors
provides an estimate of variance in a set of
variables explained by a factor.
This explanation of variance is much the same
as R2 in multiple regression.
76. SPSS Windows: Principal Components
1. Select ANALYZE from the SPSS menu bar.
2. Click DATA REDUCTION and then FACTOR.
3. Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],”
“Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive
Teeth [v6]” into the VARIABLES box
4. Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box
check INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO
AND BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED.
Click CONTINUE.
5. Click on EXTRACTION. In the pop-up window, for METHOD select
PRINCIPAL COMPONENTS (default). In the ANALYZE box, check
CORRELATION MATRIX. In the EXTRACT box, check EIGEN VALUE OVER
1(default). In the DISPLAY box, check UNROTATED FACTOR SOLUTION.
Click CONTINUE.
6. Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY
box, check ROTATED SOLUTION. Click CONTINUE.
7. Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE
COEFFICIENT MATRIX. Click CONTINUE.
8. Click OK.
77. Cluster Analysis
Cluster analysis
A multivariate approach for grouping observations
based on similarity among measured variables.
Cluster analysis is an important tool for identifying market
segments.
Cluster analysis classifies individuals or objects into a
small number of mutually exclusive and exhaustive
groups.
Objects or individuals are assigned to groups so that
there is great similarity within groups and much less
similarity between groups.
The cluster should have high internal (within-cluster)
homogeneity and external (between-cluster)
heterogeneity.
81. SPSS Windows
To select this procedure using SPSS for
Windows, click:
Analyze>Classify>Hierarchical Cluster
…
Analyze>Classify>K-Means Cluster …
Analyze>Classify>Two-Step Cluster
82. SPSS Windows: Hierarchical Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then HIERARCHICAL CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],”
“Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.
4. In the CLUSTER box, check CASES (default option). In the DISPLAY box, check
STATISTICS and PLOTS (default options).
5. Click on STATISTICS. In the pop-up window, check AGGLOMERATION
SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS.
Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER
OF CLUSTERS, enter 4. Click CONTINUE.
6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE
box, check ALL CLUSTERS (default). In the ORIENTATION box, check
VERTICAL. Click CONTINUE.
7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the
MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE.
Click CONTINUE.
8. Click OK.
83. SPSS Windows: K-Means
Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then K-MEANS CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best
Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into
the VARIABLES box.
4. For NUMBER OF CLUSTER, select 3.
5. Click on OPTIONS. In the pop-up window, in the STATISTICS
box, check INITIAL CLUSTER CENTERS and CLUSTER
INFORMATION FOR EACH CASE. Click CONTINUE.
6. Click OK.
84. SPSS Windows: Two-Step
Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then TWO-STEP CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best
Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into
the CONTINUOUS VARIABLES box.
4. For DISTANCE MEASURE, select EUCLIDEAN.
5. For NUMBER OF CLUSTER, select DETERMINE
AUTOMATICALLY.
6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATION
CRITERION (AIC).
7. Click OK.
89. SPSS Windows
The multidimensional scaling program allows individual
differences as well as aggregate analysis using ALSCAL. The
level of measurement can be ordinal, interval or ratio. Both the
direct and the derived approaches can be accommodated.
To select multidimensional scaling procedures using SPSS
for Windows, click:
Analyze>Scale>Multidimensional Scaling …
The conjoint analysis approach can be implemented using
regression if the dependent variable is metric (interval or
ratio).
This procedure can be run by clicking:
Analyze>Regression>Linear …
90. SPSS Windows : MDS
First convert similarity ratings to distances by subtracting each
value of Table 21.1 from 8. The form of the data matrix has to
be square symmetric (diagonal elements zero and distances
above and below the diagonal. See SPSS file Table 21.1 Input).
1. Select ANALYZE from the SPSS menu bar.
2. Click SCALE and then MULTIDIMENSIONAL SCALING
(ALSCAL).
3. Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate
[Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite
[UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],”
“Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into
the VARIABLES box.
91. SPSS Windows : MDS
4. In the DISTANCES box, check DATA ARE DISTANCES.
SHAPE should be SQUARE SYMMETRIC (default).
5. Click on MODEL. In the pop-up window, in the LEVEL OF
MEASUREMENT box, check INTERVAL. In the SCALING
MODEL box, check EUCLIDEAN DISTANCE. In the
CONDITIONALITY box, check MATRIX. Click CONTINUE.
6. Click on OPTIONS. In the pop-up window, in the DISPLAY
box, check GROUP PLOTS, DATA MATRIX and MODEL
AND OPTIONS SUMMARY. Click CONTINUE.
7. Click OK.
93. Further Reading
COOPER, D.R. AND SCHINDLER, P.S. (2011)
BUSINESS RESEARCH METHODS, 11TH EDN,
MCGRAW HILL
ZIKMUND, W.G., BABIN, B.J., CARR, J.C. AND
GRIFFIN, M. (2010) BUSINESS RESEARCH
METHODS, 8TH EDN, SOUTH-WESTERN
SAUNDERS, M., LEWIS, P. AND THORNHILL, A.
(2012) RESEARCH METHODS FOR BUSINESS
STUDENTS, 6TH EDN, PRENTICE HALL.
SAUNDERS, M. AND LEWIS, P. (2012) DOING
RESEARCH IN BUSINESS & MANAGEMENT, FT
PRENTICE HALL.