### analysis part 02.pptx

1. Quantitative Data Analysis: 1
2. Data analysis Descriptive/Frequency - Demographics (Number and/or percentage) - Cross-tabulation (Number and/or percentage) Goodness of Measures Measurement Validity and Reliability. Reliability: The degree to which measures are free from random error and therefore yield consistent results. Inferential/Hypothesis testing - t-test or ANOVA - Correlation - Regression
3. The Right Technique in Data Analysis? What is the purpose of the analysis? - Descriptive, compare group, relationship What is the level of measurement? - Parametric and Non-parametric How many variables are involved? - Univariate, bivariate, multivariate What kind of tests? Descriptive or Inferential. If inferential set the significance level
4. Descriptive Analysis Purpose: To describe the distribution of the demographic variable Frequencies distribution – if 1 ordinal or nominal Cross-tabulation – if 2 ordinal or nominal Means – if 1 interval or ratio Means of subgroup – if 1 interval or ratio by subgroup
5. GOODNESS OF MEASURE Validity (criterion) - Factor analysis Reliability - Cronbach alpha
6. FACTOR ANALYSIS • Go to analyze – dimension reduction- factor - Enter items of IV or DV into dialogue box - Tick descriptive – initial solution – coefficient- sig. level- determinant-KMO & Bartlett test-inverse-reproduced- -anti-image - Tick extraction – principal component - Tick rotation – varimax – rotated solution -loading plot - Tick score – display factor coefficient matrix - Tick option – sorted by size - Tick ok
7. FACTOR ANALYSIS … CONT. To conduct a Factor Analysis, start from the “Analyze” menu. This procedure is intended to reduce the complexity in a set of data, so we choose “Dimension Reduction” from the menu. And the choice in this category is “Factor,” for factor analysis. This dataset gives children’s scores on subtests of the Wechsler Intelligence Scale for Children (WISC-III). The Wechsler scales are scored to give you a “verbal” and a “performance” IQ. The question is whether we can reproduce the verbal vs. nonverbal distinction, with the appropriate subtests grouping into each category, using factor analysis.
8. FACTOR ANALYSIS … CONT. Factor analysis has no IVs and DVs, so everything you want to get factors for just goes into the list labeled “variables.” In this case, it’s all the variables. In some datasets, there is also a dummy “subject number” variable included. Be sure that you don’t include subject number as one of the variables for your factor analysis!
9. FACTOR ANALYSIS … CONT. In this dialog box, you can make a number of selections. First, I want you to un-check the box labeled “Unrotated factor solution.” This is a default setting for your printout, but it just gives you information that you don’t need, and that may distract you from the real answers. So, always go into the Extraction sub-dialog and un-check this box. Second, check the box for a “scree plot.” This will give you a scree diagram, which is one way to decide how many factors to extract. Third, look at the section labeled “Extract.” As you can see, the default setting is for SPSS to use the Kaiser stopping criterion (i.e., all factors with eigenvalues greater than 1) to decide how many factors to extract. You can set a more conservative stopping criterion by requiring each factor to have a higher eigenvalue. Or, if you already know exactly how many factors you think there will be, you can set the extraction method to a specific “Number of factors,” and then put the number into this box.
10. FACTOR ANALYSIS … CONT. This dialog allows you to choose a “rotation method” for your factor analysis. a rotation method gets factors that are as different from each other as possible, and helps you interpret the factors by putting each variable primarily on one of the factors. However, you still need to decide whether you want an “orthogonal” solution (factors are not highly correlated with each other), or an “oblique” solution (factors are correlated with one another). If you want an oblique solution, the only choice SPSS gives you is “Direct Oblimin.” All of the others are orthogonal solutions—the one that you’ll use most often from these choices is the default value, “Varimax.” Most of the factor analyses you will see in published articles use a Varimax rotation. make sure that the check-box for a “rotated solution” is on. The rotated solution gives you the factor loadings for each individual variable in your dataset, which are what you use to interpret the meaning of (i.e., make up names for) the different factors.
11. FACTOR ANALYSIS … CONT. This table shows you the actual factors that were extracted. If you look at the section labeled “Rotation Sums of Squared Loadings,” it shows you only those factors that met your cut-off criterion (extraction method). In this case, there were three factors with eigenvalues greater than 1. SPSS always extracts as many factors initially as there are variables in the dataset, but the rest of these didn’t make the grade. The “% of variance” column tells you how much of the total variability (in all of the variables together) can be accounted for by each of these summary scales or factors. Factor 1 accounts for 27.485% of the variability in all 11 variables, and so on.
13. FACTOR ANALYSIS … CONT.
14. KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619 Bartlett's Test of Sphericity Approx. Chi-Square 327.667 df 91 Sig. .000 Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative % 1 2.672 19.087 19.087 2.672 19.087 19.087 1.941 13.864 13.864 2 2.116 15.111 34.198 2.116 15.111 34.198 1.911 13.648 27.512 3 1.314 9.385 43.583 1.314 9.385 43.583 1.521 10.866 38.378 4 1.129 8.065 51.648 1.129 8.065 51.648 1.489 10.635 49.012 5 1.024 7.316 58.964 1.024 7.316 58.964 1.393 9.952 58.964 6 .915 6.538 65.502 7 .908 6.485 71.987 8 .820 5.860 77.848 9 .729 5.209 83.056 10 .628 4.484 87.540 11 .541 3.865 91.405 12 .471 3.365 94.771 13 .403 2.876 97.647 14 .329 2.353 100.000 Extraction Method: Principal Component Analysis. Output of Factor Analysis
15. Rotated Component Matrix Component 1 2 3 4 5 BO12 .742 6.966E-02 8.411E-02 .291 1.308E-02 BO7 .724 -5.468E-02 -.214 -8.570E-02 .146 BO6 .722 -1.121E-02 .243 .136 .129 BO8 6.309E-02 .801 .149 -4.440E-02 -4.116E-02 BO14 -.303 .772 .183 2.543E-02 .170 BO13 -.197 -.627 .137 .127 .333 BO11 -6.129E-02 8.341E-02 -.802 6.165E-02 .153 BO4 3.467E-02 .318 .599 9.943E-02 -7.414E-02 BO9 -1.251E-02 -1.622E-02 -1.564E-02 .820 5.378E-02 BO1 -.236 .230 .359 -.556 .143 BO2 .303 2.235E-02 .190 .539 .223 BO3 .196 .114 -3.516E-02 -9.854E-02 .797 BO10 -.127 .169 .212 -.214 -.577 BO5 4.868E-02 .260 .349 -.136 -.379 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a Rotation converged in 6 iterations. Output of Factor Analysis … cont.
16. RELIABILITY • Go to analyze – scale – reliability analysis - Enter items to be analyzed - Tick statistics – descriptive for –item – scale – scale if item deleted. • Verify the output - If the scale (Cronbach alpha) > .70, the reliability of the variable is achieved (Nunnally, 1978) - If not verify the table and check the alpha scale if item deleted to detect for improvement. - Drop item stated in scale if item deleted and run reliability again. - Do summated scale to formulate a variable.
17. R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A) Mean Std Dev Cases 1. BO12 3.9580 1.0269 143.0 2. BO6 3.4825 .9704 143.0 3. BO7 2.9650 .8914 143.0 N of Statistics for Mean Variance Std Dev Variables SCALE 10.4056 4.9048 2.2147 3 Item-total Statistics Scale Scale Corrected Mean Variance Item- Alpha if Item if Item Total if Item Deleted Deleted Correlation Deleted BO12 6.4476 2.3194 .4894 .5030 BO6 6.9231 2.4518 .4974 .4916 BO7 7.4406 2.9243 .3890 .6348 Reliability Coefficients N of Cases = 143.0 N of Items = 3 Alpha = .6465
18. 1 Dependent Variable 1 Independent Variable Test Binary Metric Logistic regression Non-metric Chi-square test Non-metric Metric Logistic regression Binary Mann-Whitney test Metric Binary t-test Metric Regression analysis Nominal Analysis of variance When do we need which test? 18
19. 1 Dependent Variable 2 or more Independent Variables Test Non-metric Metric Logistic regression Non-metric Loglinear analysis Metric Metric Multiple regression Non-metric Analysis of variance When do we need which test? 19
20. A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. A Chi-square test is a hypothesis testing method. Two common Chi- square tests involve checking if observed frequencies in one or more categories match expected frequencies. A contingency table is a tool used to summarize and analyze the relationship between two categorical variables. The Mann-Whitney U test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. For example, you could use the Mann-Whitney U test to understand whether attitudes towards pay discrimination, where attitudes are measured on an ordinal scale, differ based on gender (i.e., your dependent variable would be "attitudes towards pay discrimination" and your independent variable would be "gender", which has two groups: "male" and "female"). A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. Log-Linear Analysis is a statistical test used to determine if the proportions of categories in two or more group variables significantly differ from each other. To use this test, you should have two or more group variables with two or more options in each group variable. See more below.
21. Correlation H1: Autonomy and innovative orientation among Bumiputera SMEs in northern Malaysia are related significantly Correlations Autonomy Innovative Autonomy Pearson Correlation 1 .072 Sig. (2-tailed) . .297 N 210 210 Innovative Pearson Correlation .072 1 Sig. (2-tailed) .297 . N 210 210 Interpretation: (r = .072, p < .297) if significant level is set at p < .05, then there is no statistical significant correlation between autonomy and innovativeness. Therefore, H1 rejected.
22.  The purpose of regression models is learn more about the relationship between several independent or predictor variables and a dependent or criterion variable.  The computational problem that needs to be solved in regression analysis is to fit a straight line to a number of points.  Y = b0 +b1x1 + b2x2 + … + bnxn + e Regression models 23
23.  Linear regression  1 dependent variable: continuous/scale  One or more independent variables: continuous/scale  Hierarchical regression  1 dependent variable: continuous/scale  Multiple blocks of independent variables: continuous/scale  Logistic regression  1 dependent variable: binary  One or more independent variables: continuous/scale Types of Regression Models 24
24. Output of SPSS Regression Analyses 25
25. Output of SPSS Regression Analyses 26 the F-test can assess the equality of variances.
26. Output of SPSS Regression Analyses 27 Confidence interval = sample mean ± margin of error To obtain this confidence interval, add and subtract the margin of error from the sample mean. This result is the upper limit and the lower limit of the confidence interval.
27. MULTIPLE REGRESSION ANALYSIS…CONT. Consider Some Multiple Regression Assumptions: 1. Normality – Verify Skewness < 2.0 or histogram (Skewness is a measurement of the distortion of symmetrical distribution or asymmetry in a data set. ) 1. Linearity – Verify p-p plot of std. regress residuals 2. Homocedasticity – an assumption of equal or similar variances in different groups being compared. 3. Free from error term – Durbin Watson between 1.5 – 2.5 4. Free from multicollinearity – Correlation < .70,
28. 1. Describe Descriptive Statistics (means, st. dev.) of all variables 2. Report on testing of assumptions – especially if assumptions are violated and what was done about it. 3. Report on model fit statistics (F, df1, df2, R2). 4. Report parameter estimates – for constant and IV 1. Standardized Beta 2. T-value and significance 3. (Confidence intervals) Reporting Regression Analyses 29
29.  Type of regression models where  The dependent variable is binary  [or ordinal: ordered logistic regression (e.g. 3 categories: low, medium, high)]  Checks whether we can predict in which category we will land based on the values of the IV.  Essentially compares a model with predictors (BLOCK 1) against a model without predictors (BLOCK 0):  is a prediction with our variables better than random chance? Example: http://eprints.qut.edu.au/31606/ Logistic Regression Analysis 30
30. Logistic Regression Analysis: Output 31
31. Logistic Regression Analysis: Output 32
32. Reporting Logistic Regression Analyses 33
33.  a statistical method used to test differences between two or more means.  Inferences about means are made by analyzing variance.  Think of it as an extension of t-tests  To two or more groups  To means + variance rather than only means.  In a typical ANOVA, the null hypothesis is that all groups are random samples of the same population.  For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that all treatments have the same effect (perhaps none).  Rejecting the null hypothesis would imply that different treatments result in altered effects.  Often used in experimental research, to study effects of treatments. Analysis of Variance Models 34
34.  One-way ANOVA  used to test for differences among two or more independent groups (means).  Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test (when there are only two means to compare, the t-test and the ANOVA F-test are equivalent).  Factorial ANOVA  used when the experimenter wants to study the interaction effects among the treatments.  Repeated measures ANOVA  used when the same subjects are used for each treatment (e.g., in a longitudinal study).  Multivariate analysis of variance (MANOVA)  used when there is more than one dependent variable.  Analysis of covariance (ANCOVA)  blends ANOVA and regression: evaluates whether population means of a DV are equal across levels of a categorical IV [treatment], while statistically controlling for the effects of other continuous variables that are not of primary interest [covariates]. Types of Analysis of Variance Models 35
35. When can we use ANOVA? • The t-test is used to compare the means of two-groups. • One-way ANOVA is used to compare the means of two or more groups. • We can use one-way ANOVA whenever the dependent variable (DV) is numerical and the independent variable (IV) is categorical. • The independent variable in ANOVA is also called a factor. 36
36. Examples The following are situations where we can use ANOVA: • Testing the differences in blood pressure among different groups of people (DV is blood pressure and the group is the IV). • Testing which type of social media affects hours of sleep (type of social media used is the IV and hours of sleep is the DV). 37
37.  The type of ANOVA model is highly dependent on your research design and theory; in particular:  What are between-subject factors? How many?  What are within-subject factors? How many?  What are treatments? How many?  Which factors are theoretically relevant, which are mere controls? ANOVA and Research Designs 38
38.  Independence, normality and homogeneity of the variances of the residuals  Note there are no necessary assumptions for ANOVA in its full generality, but the F-test used for ANOVA hypothesis testing has assumptions and practical limitations. ANOVA Assumptions 39
39.  One-way = one-way between groups model  E.g., school performance between boys versus girls  Two-way = two one-ways for each factor PLUS interaction between two factors  E.g., school performance between boys versus girls and locals versus internationals  Three-way  You get the idea… One-way and two-way ANOVA 40
40.  Injuries sustained by kids wearing superhero costumes  Does it depend on which costume they wear?  Superman, Spiderman, Hulk, Ninja Turtle?  Adopted from http://www.statisticshell.com/docs/onewayanova.pdf Illustration: Analysis of Variance 41
41.  Are injuries sustained random or significantly dependent on wearing superhero costumes?  Is there any order of injuries sustained by type of costume? What ANOVA could tell us 42
42. What ANOVA could tell us Variance in injuries severity explained by different costumes Flying superheroes Non-flying superheroes Superman Spiderman Hulk Ninja Turtle Contrast 1 Contrast 2 Contrast 3
43. Assumptions of ANOVA • The observations in each group are normally distributed. This can be tested by plotting the numerical variable separately for each group and checking that they all have a bell shape. Alternatively, you could use the Shapiro-Wilk test for normality. 44
44. Assumptions • The groups have equal variances (i.e., homogeneity of variance). You can plot each group separately and check that they exhibit similar variability. Alternatively, you can use Levene’s test for homogeneity. • The observations in each group are independent. This could be assessed by common sense looking at the study design. For example, if there is a participant in more than one group, your observations are not independent. 45
45. Hypothesis Testing ANOVA tests the null hypothesis: H0 : The groups have equal means versus the alternative hypothesis: H1 : At least one group mean is different from the other group means. 46 F-Test
46. ANOVA in SPSS 47 Example: Is there a difference in optimism scores for young, middle-aged and old participants? Categorical IV - Age with 3 levels: • 29 and younger • Between 30 and 44 • 45 or above Continuous DV – Optimism scores
47. ANOVA in SPSS 48 Interpreting the output: 1. Check that the groups have equal variances using Levene’s test for homogeneity. • Check the significance value (Sig.) for Levene’s test Based on Mean. • If this number is greater than .05 you have not violated the assumption of homogeneity of variance.
48. ANOVA in SPSS 49 Interpreting the output: 2. Check the significance of the ANOVA. • If the Sig. value is less than or equal to .05, there is a significant difference somewhere among the mean scores on your dependent variable for the three groups. • However, this does not tell us which group is different from which other group.
49. ANOVA in SPSS 50 Interpreting the output: 3. ONLY if the ANOVA is significant, check the significance of the differences between each pair of groups in the table labelled Multiple Comparisons.
50. ANOVA in SPSS 51 Calculating effect size: • In an ANOVA, effect size will tell us how large the difference between groups is. • We will calculate eta squared, which is one of the most common effect size statistics. Eta squared = Sum of squares between groups Total sum of squares
51. ANOVA in SPSS 52 Calculating effect size: 179.07 8513.02 = .02 According to Cohen (1988): Small effect: .01 Medium effect: .06 Large effect: .14
52. ANOVA in SPSS 53 Example results write-up: A one way between-groups analysis of variance was conducted to explore the impact of age on levels of optimism. Participants were divided into three groups according to their age (Group 1: 29yrs or less; Group 2: 30 to 44yrs; Group 3: 45yrs and above). There was a statistically significant difference at the p < .05 level in optimism scores for the three age groups: F (2, 432) = 4.6, p = .01. Despite reaching statistical significance, the actual difference in mean scores between the groups was quite small. The effect size, calculated using eta squared, was .02. Post-hoc comparisons using the Tukey HSD test indicated that the mean score for Group 1 (M = 21.36, SD = 4.55) was significantly different from Group 3 (M = 22.96, SD = 4.49).
53. ANOVA in SPSS 54 Note: Results are usually rounded to two decimal places
54. Descriptive Statistics-Numeric Data • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS DESCRIPTIVES • Choose any variables to be analyzed and place them in box on right • Options include:
55. e 8 8 0 1 3 3 1 1 8 C V t i t i t i t i t i E t i t i N m i m u e t d i a i a
56. Descriptive Statistics-General Data • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS FREQUENCIES • Choose any variables to be analyzed and place them in box on right • Options include (For Categorical Variables): • Frequency Tables • Pie Charts, Bar Charts • Options include (For Numeric Variables) • Frequency Tables (Useful for discrete data) • Measures of Central Tendency, Dispersion, Percentiles • Pie Charts, Histograms
57. Example 1.4 - Smoking Status S 0 9 9 9 3 3 3 2 9 6 6 8 2 4 4 2 3 8 8 0 7 0 0 N Q Q C O T V u r c P u r c
58. Vertical Bar Charts and Pie Charts • After Importing your dataset, and providing names to variables, click on: • GRAPHS  BAR…  SIMPLE (Summaries for Groups of Cases)  DEFINE • Bars Represent N of Cases (or % of Cases) • Put the variable of interest as the CATEGORY AXIS • GRAPHS  PIE… (Summaries for Groups of Cases)  DEFINE • Slices Represent N of Cases (or % of Cases) • Put the variable of interest as the DEFINE SLICES BY
59. Example 1.5 - Antibiotic Study OUTCOME 5 4 3 2 1 Count 80 60 40 20 0 5 4 3 2 1
60. Histograms • After Importing your dataset, and providing names to variables, click on: • GRAPHS  HISTOGRAM • Select Variable to be plotted • Click on DISPLAY NORMAL CURVE if you want a normal curve superimposed (see Chapter 3).
61. Example 1.6 - Drug Approval Times MONTHS 1 2 0 . 0 1 1 0 . 0 1 0 0 . 0 9 0 . 0 8 0 . 0 7 0 . 0 6 0 . 0 5 0 . 0 4 0 . 0 3 0 . 0 2 0 . 0 1 0 . 0 0 . 0 30 20 10 0 Std. Dev = 20.97 Mean = 32.1 N = 175.00
62. Side-by-Side Bar Charts • After Importing your dataset, and providing names to variables, click on: • GRAPHS  BAR…  Clustered (Summaries for Groups of Cases)  DEFINE • Bars Represent N of Cases (or % of Cases) • CATEGORY AXIS: Variable that represents groups to be compared (independent variable) • DEFINE CLUSTERS BY: Variable that represents outcomes of interest (dependent variable)
63. Example 1.7 - Streptomycin Study TRT 2 1 Count 30 20 10 0 OUTCOME 1 2 3 4 5 6
64. Scatterplots • After Importing your dataset, and providing names to variables, click on: • GRAPHS  SCATTER  SIMPLE  DEFINE • For Y-AXIS, choose the Dependent (Response) Variable • For X-AXIS, choose the Independent (Explanatory) Variable
65. Example 1.8 - Theophylline Clearance DRUG 3.5 3.0 2.5 2.0 1.5 1.0 .5 THCLRNCE 8 7 6 5 4 3 2 1 0
66. Scatterplots with 2 Independent Variables • After Importing your dataset, and providing names to variables, click on: • GRAPHS  SCATTER  SIMPLE  DEFINE • For Y-AXIS, choose the Dependent Variable • For X-AXIS, choose the Independent Variable with the most levels • For SET MARKERS BY, choose the Independent Variable with the fewest levels
67. Example 1.8 - Theophylline Clearance SUBJECT 16 14 12 10 8 6 4 2 0 THCLRNCE 8 7 6 5 4 3 2 1 0 DRUG Tagamet Pepcid Placebo
68. Contingency Tables for Conditional Probabilities • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, select the variable you are conditioning on (Independent Variable) • For COLUMNS, select the variable you are finding the conditional probability of (Dependent Variable) • Click on CELLS • Click on ROW Percentages
69. Example 1.10 - Alcohol & Mortality C 5 5 0 % % % 1 4 5 % % % 6 9 5 % % % C % C % C % 0 1 W T 0 1 A o t
70. Independent Sample t-Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  COMPARE MEANS  INDEPENDENT SAMPLES T-TEST • For TEST VARIABLE, Select the dependent (response) variable(s) • For GROUPING VARIABLE, Select the independent variable. Then define the names of the 2 levels to be compared (this can be used even when the full dataset has more than 2 levels for independent variable).
71. Example 3.5 - Levocabastine in Renal Patients S t 6 3 2 2 6 7 9 7 G N H A N e a e E e t S 4 1 6 0 4 7 7 5 0 8 3 6 3 6 7 7 1 3 4 6 E q a s E q n o A U F S i g s T o f V t d f 2 - t M e a e r e . E e r e o w p p o n v a l e r e u a l
72. Paired t-test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  COMPARE MEANS  PAIRED SAMPLES T-TEST • For PAIRED VARIABLES, Select the two dependent (response) variables (the analysis will be based on first variable minus second variable)
73. Example 3.7 - Cmax in SRC&IRC Codeine p l 3 2 8 3 5 5 S I P 1 e a N e E e e s S P N e l i g m 3 9 2 8 9 6 2 0 S P e a e v E e a o w p p o n a l e r e i f f t d f 2 -
74. Chi-Square Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the Independent Variable • For COLUMNS, Select the Dependent Variable • Under STATISTICS, Click on CHI-SQUARE • Under CELLS, Click on OBSERVED, EXPECTED, ROW PERCENTAGES, and ADJUSTED STANDARDIZED RESIDUALS • NOTE: Large ADJUSTED STANDARDIZED RESIDUALS (in absolute value) show which cells are inconsistent with null hypothesis of independence. A common rule of thumb is seeing which if any cells have values >3 in absolute value
75. Example 5.8 - Marital Status & Cancer R E V 2 9 4 7 7 6 . 1 7 . 9 6 . 0 % % 0 % . 3 2 . 3 1 6 0 8 2 4 . 3 1 . 7 4 . 0 % % 0 % . 7 - . 7 6 7 5 6 2 3 . 6 1 . 4 3 . 0 % % 0 % . 1 1 . 1 5 5 1 0 . 0 5 . 0 0 . 0 % % 0 % . 0 . 0 1 7 1 6 3 3 . 0 6 . 0 3 . 0 % % 0 % C o u E x p % w A d j C o u E x p % w A d j C o u E x p % w A d j C o u E x p % w A d j C o u E x p % w S i n M a r W i d D i v / M A T o t a a n c e C a n c N C R E T o t a l u a r 0 a 3 3 7 2 3 3 4 1 1 5 7 3 P L L A N a l u d f m p s i d 1 m a
76. Fisher’s Exact Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the Independent Variable • For COLUMNS, Select the Dependent Variable • Under STATISTICS, Click on CHI-SQUARE • Under CELLS, Click on OBSERVED and ROW PERCENTAGES • NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
77. Example 5.5 - Antiseptic Experiment R E 6 4 0 % % % 6 9 5 % % % 2 3 5 % % % C % C % C % A C T T e a D e T H o t u a r 5 b 1 4 8 1 8 7 1 3 0 5 0 4 2 1 4 5 P e C a L i F i L i A s N a l u d f m p . s i d c t s i d a c t s i d C a . 0 1 0 b .
78. McNemar’s Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the outcome for condition/time 1 • For COLUMNS, Select the outcome for condition/time 2 • Under STATISTICS, Click on MCNEMAR • Under CELLS, Click on OBSERVED and TOTAL PERCENTAGES • NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
79. Example 5.6 - Report of Implant Leak R E 9 8 7 % % % 5 3 8 % % % 4 1 5 % % % C % C % C % P A S T e s s e G o t a a M N l u t i d B a P-value
80. Relative Risks and Odds Ratios • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the Independent Variable • For COLUMNS, Select the Dependent Variable • Under STATISTICS, Click on RISK • Under CELLS, Click on OBSERVED and ROW PERCENTAGES • NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
81. Example 5.1 - Pamidronate Study R 7 9 6 % % % 4 7 1 % % % 1 6 7 % % % C % C % C % P P P T Y e N o E V o t E s t 6 3 0 7 2 5 6 3 6 7 O ( P F Y F N N a lu o w p p o n e r
82. Example 5.2 - Lip Cancer R E 9 9 8 % % % 8 1 9 % % % 7 0 7 % % % C % C % C % Y N P T Y e N o C R o t s t 3 1 9 6 8 5 8 2 4 7 O P F Y F N a l u w p p o n e r
83. Correlation After Importing your dataset, and providing names to variables, click on: ANALYZE  CORRELATE BIVARIATE Select the VARIABLES Select the PEARSON CORRELATION Select the Two tailed test of significance Select Flag significant correlations
84. Linear Regression • After Importing your dataset, and providing names to variables, click on: • ANALYZE  REGRESSION  LINEAR • Select the DEPENDENT VARIABLE • Select the INDEPENDENT VARAIABLE(S) • Click on STATISTICS, then ESTIMATES, CONFIDENCE INTERVALS, MODEL FIT
85. Examples 7.1-7.6 - Gemfibrozil Clearance i c a 8 8 1 0 0 6 5 1 5 3 6 2 8 ( C C M 1 B E d a i c e t a r f i c t S i g r B r B n c e D a
86. Examples 7.1-7.6 - Gemfibrozil Clearance O b 2 1 8 3 6 a 8 5 3 0 6 R R T M 1 m u a d f S F S i g P a D b S u b 5 a 1 6 0 M 1 R q u u s q E r s t P a D b
87. Linear Regression • We will introduce simple linear regression, in particular we will: • Learn when we can use simple linear regression • Learn the basic workings involved in simple linear regression • Linear Regression in SPSS • This presentation is intended for students in initial stages of Statistics. No previous knowledge is required. 90
88. Linear Regression • Regression is used to study the relationship between two variables. • How a change in one variable (e.g., someone’s exercise habits) can predict the outcome of another variable (e.g., general health). • We can use simple regression if both the dependent variable (DV) and the independent variable (IV) are numerical. • If the DV is numerical but the IV is categorical, it is best to use ANOVA. 91
89. Examples The following are situations where we can use regression: • Testing if IQ affects income (IQ is the IV and income is the DV). • Testing if study time affects grades (hours of study time is the IV and average grade is the DV). • Testing if exercise affects blood pressure (hours of exercise is the IV and blood pressure is the DV). 92
90. Displaying the data When both the DV and IV are numerical, we can represent data in the form of a scatterplot. 93
91. Displaying the data It is important to perform a scatterplot because it helps us to see if the relationship is linear. In this example, the relationship between body fat % and chance of heart failure is not linear and hence it is not sensible to use linear regression.
92. 95 • Straight line prediction model. • As an independent variable changes, what happens to the dependent variable? I.e., as an independent variable goes up and down, does the dependent variable go up and down? • They could either move in the same direction (positive relationship) or opposite direction (negative relationship) Linear Regression
93. 96 • Straight line prediction model. • As an independent variable changes, what happens to the dependent variable? I.e., as an independent variable goes up and down, does the dependent variable go up and down? • They could either move in the same direction (positive relationship) or opposite direction (negative relationship) Linear Regression
94. 97 Linear Regression
95. 98 Linear Regression y = B0 + B1 * X + E grades B0 B1 study time
96. 99 Linear Regression y = B0 - B1 * X + E
97. Assumptions of regression • The errors E are normally distributed. This can be tested by plotting an histogram of the residuals of the regression and checking that they all have a bell shape. Alternatively, you could use the Shapiro-Wilk test for normality. 100
98. Assumptions of regression • There are no clear outliers This can be checked by performing the scatterplot. The outliers (circled in red in the figure) can simply be removed from the analysis. 101
99. Hypothesis testing Regression tests the null hypothesis: H0 : There is no effect of X on Y. versus the alternative hypothesis: H1 : There is an effect of X on Y. If the null hypothesis is rejected, we reject the hypothesis that there is no relationship and hence we conclude that there is a significant relationship between X and Y. 102
100. How do we know if we should reject the null hypothesis? We perform regression in SPSS and look at the p-value of the coefficient b. If the p-value is less than 0.05, we reject the null hypothesis (the variable is significant), otherwise, we do not reject the null hypothesis (the variable is not significant). 103 Hypothesis testing
101. Interpreting the output: 1. The first table that we’re interested in is the Model Summary. • The R value represents the simple correlation. This indicates a strong degree of correlation between our two variables. • The R2 value indicates how much of the total variation in the dependent variable (perceived stress) can be explained by the independent variable (mastery). In this case, 37.3% can be explained. 104 Regression in SPSS https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
102. Interpreting the output: 2. The next table is the ANOVA table, which shows us how well the regression equation fits the data (i.e., predicts the dependent variable). • The regression predicts the dependent variable significantly well (p < .001). 105 Regression in SPSS https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
103. Interpreting the output: 3. The Coefficients table gives us the information that we need to predict stress from mastery, as well as determine whether mastery contributes statistically significantly to the model. 106 Regression in SPSS Y = B0 + B1 * X Total perceived stress = 46.32 + (-.9*Total Mastery)
104. Example results write-up: A simple linear regression was carried out to test if total mastery significantly predicted total perceived stress. The results of the regression indicated that the model explained 37.3% of the variance and that the model was significant F (1, 431) = 257.63, p < .001. It was found that total mastery significantly predicted total perceived stress (B1 = -.9, p < .001). The final predictive model was: total perceived stress = 46.32 + (-9*total mastery) 107 Regression in SPSS
105. 108 Regression in SPSS Results are usually rounded to two decimal places
106. Understanding Factor Analysis  Regardless of purpose, factor analysis is used in:  the determination of a small number of factors based on a particular number of inter-related quantitative variables.  Unlike variables directly measured such as speed, height, weight, etc., some variables such as egoism, creativity, happiness, religiosity, comfort are not a single measurable entity.  They are constructs that are derived from the measurement of other, directly observable variables .
107. Understanding Factor Analysis 110  Constructs are usually defined as unobservable latent variables. E.g.:  motivation/love/hate/care/altruism/anxiety/worry/stress/product quality/physical aptitude/democracy /reliability/power.  Example: the construct of teaching effectiveness. Several variables are used to allow the measurement of such construct (usually several scale items are used) because the construct may include several dimensions.  Factor analysis measures not directly observable constructs by measuring several of its underlying dimensions.  The identification of such underlying dimensions (factors) simplifies the understanding and description of complex constructs.
108. Understanding Factor Analysis 111 • Generally, the number of factors is much smaller than the number of measures. • Therefore, the expectation is that a factor represents a set of measures. • From this angle, factor analysis is viewed as a data-reduction technique as it reduces a large number of overlapping variables to a smaller set of factors that reflect construct(s) or different dimensions of contruct(s).
109. Understanding Factor Analysis 112  The assumption of factor analysis is that underlying dimensions (factors) can be used to explain complex phenomena.  Observed correlations between variables result from their sharing of factors.  Example: Correlations between a person’s test scores might be linked to shared factors such as general intelligence, critical thinking and reasoning skills, reading comprehension etc.
110. Ingredients of a Good Factor Analysis Solution 113 • A major goal of factor analysis is to represent relationships among sets of variables parsimoniously yet keeping factors meaningful. • A good factor solution is both simple and interpretable. • When factors can be interpreted, new insights are possible.
111. Application of Factor Analysis 114  Defining indicators of constructs:  Ideally 4 or more measures should be chosen to represent each construct of interest.  The choice of measures should, as much as possible, be guided by theory, previous research, and logic.
112. Application of Factor Analysis 115  Defining dimensions for an existing measure: In this case the variables to be analyzed are chosen by the initial researcher and not the person conducting the analysis. Factor analysis is performed on a predetermined set of items/scales. Results of factor analysis may not always be satisfactory: The items or scales may be poor indicators of the construct or constructs. There may be too few items or scales to represent each underlying dimension.
113. Application of Factor Analysis 116  Selecting items or scales to be included in a measure. Factor analysis may be conducted to determine what items or scales should be included and excluded from a measure. Results of the analysis should not be used alone in making decisions of inclusions or exclusions. Decisions should be taken in conjunction with the theory and what is known about the construct(s) that the items or scales assess.
114. Steps in Factor Analysis 117 • Factor analysis usually proceeds in four steps: • 1st Step: the correlation matrix for all variables is computed • 2nd Step: Factor extraction • 3rd Step: Factor rotation • 4th Step: Make final decisions about the number of underlying factors
115. Steps in Factor Analysis: The Correlation Matrix 118 • 1st Step: the correlation matrix • Generate a correlation matrix for all variables • Identify variables not related to other variables • If the correlation between variables are small, it is unlikely that they share common factors (variables must be related to each other for the factor model to be appropriate). • Think of correlations in absolute value. • Correlation coefficients greater than 0.3 in absolute value are indicative of acceptable correlations. • Examine visually the appropriateness of the factor model.
116. Steps in Factor Analysis: The Correlation Matrix • Bartlett Test of Sphericity:  used to test the hypothesis the correlation matrix is an identity matrix (all diagonal terms are 1 and all off-diagonal terms are 0).  If the value of the test statistic for sphericity is large and the associated significance level is small, it is unlikely that the population correlation matrix is an identity. • If the hypothesis that the population correlation matrix is an identity cannot be rejected because the observed significance level is large, the use of the factor model should be reconsidered. 119
117. Steps in Factor Analysis: The Correlation Matrix • The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:  is an index for comparing the magnitude of the observed correlation coefficients to the magnitude of the partial correlation coefficients.  The closer the KMO measure to 1 indicate a sizeable sampling adequacy (.8 and higher are great, .7 is acceptable, .6 is mediocre, less than .5 is unaccaptable ).  Reasonably large values are needed for a good factor analysis. Small KMO values indicate that a factor analysis of the variables may not be a good idea. 120
118. Steps in Factor Analysis: Factor Extraction 121  2nd Step: Factor extraction  The primary objective of this stage is to determine the factors.  Initial decisions can be made here about the number of factors underlying a set of measured variables.  Estimates of initial factors are obtained using Principal components analysis.  The principal components analysis is the most commonly used extraction method . Other factor extraction methods include:  Maximum likelihood method  Principal axis factoring  Alpha method  Unweighted lease squares method  Generalized least square method  Image factoring.
119. Steps in Factor Analysis: Factor Extraction 122  In principal components analysis, linear combinations of the observed variables are formed.  The 1st principal component is the combination that accounts for the largest amount of variance in the sample (1st extracted factor).  The 2nd principle component accounts for the next largest amount of variance and is uncorrelated with the first (2nd extracted factor).  Successive components explain progressively smaller portions of the total sample variance, and all are uncorrelated with each other.
120. Steps in Factor Analysis:Factor Extraction 123  To decide on how many factors we need to represent the data, we use 2 statistical criteria:  Eigen Values, and  The Scree Plot.  The determination of the number of factors is usually done by considering only factors with Eigen values greater than 1.  Factors with a variance less than 1 are no better than a single variable, since each variable is expected to have a variance of 1. Total Variance Explained Comp onent Initial Eigenvalues Extraction Sums of Squared Loadings Total % of Variance Cumulati ve % Total % of Variance Cumulati ve % 1 3.046 30.465 30.465 3.046 30.465 30.465 2 1.801 18.011 48.476 1.801 18.011 48.476 3 1.009 10.091 58.566 1.009 10.091 58.566 4 .934 9.336 67.902 5 .840 8.404 76.307 6 .711 7.107 83.414 7 .574 5.737 89.151 8 .440 4.396 93.547 9 .337 3.368 96.915 10 .308 3.085 100.000 Extraction Method: Principal Component Analysis.
121. Steps in Factor Analysis: Factor Extraction  The examination of the Scree plot provides a visual of the total variance associated with each factor.  The steep slope shows the large factors.  The gradual trailing off (scree) shows the rest of the factors usually lower than an Eigen value of 1.  In choosing the number of factors, in addition to the statistical criteria, one should make initial decisions based on conceptual and theoretical grounds.  At this stage, the decision about the number of factors is not final. 124
122. Steps in Factor Analysis: Factor Extraction 125 Component Matrixa Component 1 2 3 I discussed my frustrations and feelings with person(s) in school .771 -.271 .121 I tried to develop a step-by-step plan of action to remedy the problems .545 .530 .264 I expressed my emotions to my family and close friends .580 -.311 .265 I read, attended workshops, or sought someother educational approach to correct the problem .398 .356 -.374 I tried to be emotionally honest with my self about the problems .436 .441 -.368 I sought advice from others on how I should solve the problems .705 -.362 .117 I explored the emotions caused by the problems .594 .184 -.537 I took direct action to try to correct the problems .074 .640 .443 I told someone I could trust about how I felt about the problems .752 -.351 .081 I put aside other activities so that I could work to solve the problems .225 .576 .272 Extraction Method: Principal Component Analysis. a. 3 components extracted. Component Matrix using Principle Component Analysis
123. Steps in Factor Analysis: Factor Rotation 126  3rd Step: Factor rotation.  In this step, factors are rotated.  Un-rotated factors are typically not very interpretable (most factors are correlated with may variables).  Factors are rotated to make them more meaningful and easier to interpret (each variable is associated with a minimal number of factors).  Different rotation methods may result in the identification of somewhat different factors.
124. Steps in Factor Analysis: Factor Rotation  The most popular rotational method is Varimax rotations.  Varimax use orthogonal rotations yielding uncorrelated factors/components.  Varimax attempts to minimize the number of variables that have high loadings on a factor. This enhances the interpretability of the factors. 127
125. Steps in Factor Analysis: Factor Rotation • Other common rotational method used include Oblique rotations which yield correlated factors. • Oblique rotations are less frequently used because their results are more difficult to summarize. • Other rotational methods include:  Quartimax (Orthogonal)  Equamax (Orthogonal)  Promax (oblique) 128
126. Steps in Factor Analysis: Factor Rotation © Dr. Maher Khelifa 129 • A factor is interpreted or named by examining the largest values linking the factor to the measured variables in the rotated factor matrix. Rotated Component Matrixa Component 1 2 3 I discussed my frustrations and feelings with person(s) in school .803 .186 .050 I tried to develop a step-by-step plan of action to remedy the problems .270 .304 .694 I expressed my emotions to my family and close friends .706 -.036 .059 I read, attended workshops, or sought someother educational approach to correct the problem .050 .633 .145 I tried to be emotionally honest with my self about the problems .042 .685 .222 I sought advice from others on how I should solve the problems .792 .117 -.038 I explored the emotions caused by the problems .248 .782 -.037 I took direct action to try to correct the problems -.120 -.023 .772 I told someone I could trust about how I felt about the problems .815 .172 -.040 I put aside other activities so that I could work to solve the problems -.014 .155 .657 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 5 iterations.
127. Steps in Factor Analysis: Making Final Decisions 130 • 4th Step: Making final decisions • The final decision about the number of factors to choose is the number of factors for the rotated solution that is most interpretable. • To identify factors, group variables that have large loadings for the same factor. • Plots of loadings provide a visual for variable clusters. • Interpret factors according to the meaning of the variables • This decision should be guided by: • A priori conceptual beliefs about the number of factors from past research or theory • Eigen values computed in step 2. • The relative interpretability of rotated solutions computed in step 3.
128. Assumptions Underlying Factor Analysis 131 • Assumption underlying factor analysis include. • The measured variables are linearly related to the factors + errors. • This assumption is likely to be violated if items limited response scales (two- point response scale like True/False, Right/Wrong items). • The data should have a bi-variate normal distribution for each pair of variables. • Observations are independent. • The factor analysis model assumes that variables are determined by common factors and unique factors. All unique factors are assumed to be uncorrelated with each other and with the common factors.
129. Obtaining a Factor Analysis • Click: • Analyze and select • Dimension Reduction • Factor • A factor Analysis Box will appear © Dr. Maher Khelifa 132
130. Obtaining a Factor Analysis • Move variables/scale items to Variable box 133
131. Obtaining a Factor Analysis • Factor extraction • When variables are in variable box, select: • Extraction © Dr. Maher Khelifa 134
132. Obtaining a Factor Analysis • When the factor extraction Box appears, select: • Scree Plot • keep all default selections including: • Principle component Analysis • Based on Eigen Value of 1, and • Un-rotated factor solution © Dr. Maher Khelifa 135
133. Obtaining a Factor Analysis • During factor extraction keep factor rotation default of: • None • Press continue © Dr. Maher Khelifa 136
134. Obtaining a Factor Analysis • During Factor Rotation: • Decide on the number of factors based on actor extraction phase and enter the desired number of factors by choosing: • Fixed number of factors and entering the desired number of factors to extract. • Under Rotation Choose Varimax • Press continue • Then OK © Dr. Maher Khelifa 137
135. Bibliographical References  Almar, E.C. (2000). Statistical Tricks and traps. Los Angeles, CA: Pyrczak Publishing.  Bluman, A.G. (2008). Elemtary Statistics (6th Ed.). New York, NY: McGraw Hill.  Chatterjee, S., Hadi, A., & Price, B. (2000) Regression analysis by example. New York: Wiley.  Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd Ed.). Hillsdale, NJ.: Lawrence Erlbaum.  Darlington, R.B. (1990). Regression and linear models. New York: McGraw-Hill.  Einspruch, E.L. (2005). An introductory Guide to SPSS for Windows (2nd Ed.). Thousand Oak, CA: Sage Publications.  Fox, J. (1997) Applied regression analysis, linear models, and related methods. Thousand Oaks, CA: Sage Publications.  Glassnapp, D. R. (1984). Change scores and regression suppressor conditions. Educational and Psychological Measurement (44), 851-867.  Glassnapp. D. R., & Poggio, J. (1985). Essentials of Statistical Analysis for the Behavioral Sciences. Columbus, OH: Charles E. Merril Publishing.  Grimm, L.G., & Yarnold, P.R. (2000). Reading and understanding Multivariate statistics. Washington DC: American Psychological Association.  Hamilton, L.C. (1992) Regression with graphics. Belmont, CA: Wadsworth.  Hochberg, Y., & Tamhane, A.C. (1987). Multiple Comparisons Procedures. New York: John Wiley.  Jaeger, R. M. Statistics: A spectator sport (2nd Ed.). Newbury Park, London: Sage Publications. © Dr. Maher Khelifa 138
136. Bibliographical References • Keppel, G. (1991). Design and Analysis: A researcher’s handbook (3rd Ed.). Englwood Cliffs, NJ: Prentice Hall. • Maracuilo, L.A., & Serlin, R.C. (1988). Statistical methods for the social and behavioral sciences. New York: Freeman and Company. • Maxwell, S.E., & Delaney, H.D. (2000). Designing experiments and analyzing data: Amodel comparison perspective. Mahwah, NJ. : Lawrence Erlbaum. • Norusis, J. M. (1993). SPSS for Windows Base System User’s Guide. Release 6.0. Chicago, IL: SPSS Inc. • Norusis, J. M. (1993). SPSS for Windows Advanced Statistics. Release 6.0. Chicago, IL: SPSS Inc. • Norusis, J. M. (2006). SPSS Statistics 15.0 Guide to Data Analysis. Upper Saddle River, NJ.: Prentice Hall. • Norusis, J. M. (2008). SPSS Statistics 17.0 Guide to Data Analysis. Upper Saddle River, NJ.: Prentice Hall. • Norusis, J. M. (2008). SPSS Statistics 17.0 Statistical Procedures Companion. Upper Saddle River, NJ.: Prentice Hall. • Norusis, J. M. (2008). SPSS Statistics 17.0 Advanced Statistical Procedures Companion. Upper Saddle River, NJ.: Prentice Hall. • Pedhazur, E.J. (1997). Multiple regression in behavioral research, third edition. New York: Harcourt Brace College Publishers. © Dr. Maher Khelifa 139