Identifying Subgroups of Potential Youth Smokers Using Cluster Analysis
1. 1
IDENTIFYING SUBGROUPS OF POTENTIAL YOUTH SMOKERS
USING LATENT PROFILE ANALYSIS:
A COMPARISON WITH WARD’S METHOD CLUSTER ANALYSIS
BY
DAVID LEE VILLEGAS
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIRMENTS FOR THE DEGREE OF
MASTER OF ARTS
IN PSYCHOLOGY
UNIVERSITY OF RHODE ISLAND
2012
2. MASTER OF ARTS IN PSYCHOLOGY
OF
DAVID LEE VILLEGAS
APPROVED:
Thesis Committee:
Major Professor Robert G. Laforge
Wayne F. Velicer
Lisa L. Weyandt
Judy A. Van Wyk
Nasser H. Zawia
DEAN OF THE GRADUATE SCHOOL
UNIVERSITY OF RHODE ISLAND
2012
3. ABSTRACT
The purpose of this thesis was to compare Ward’s method cluster and latent
profile analysis in their identification of subgroups of youth that are not yet smokers but
may be in the near future. The variables used to identify these subgroups are measures
indicative of behavior change known as the Pros of Smoking, Cons of Smoking, and
Situational Temptations to Smoke. Both analyses were conducted within 5 random
subsamples generated out of a larger pool (N=3493) of 6th
grade children attending
elementary school in the state of Rhode Island. Evidence for the existence of a 4-cluster
solution of potential youth smokers was found with both methods despite variability in
actual case assignment of subjects to subgroups within and across methods. The use of
Ward’s method and latent profile analysis together allowed for the identification of
additional cases at elevated risk for smoking acquisition.
4. iii
ACKNOWLEDGMENTS
I would like to acknowledge my Major Professor, Dr. Robert G. Laforge.
Through his guidance and support I was able to develop a better understanding of the
methodology incorporated in this thesis and improve upon my writing. I am also
thankful to Dr. Wayne F. Velicer, who allowed me to use his Project Best dataset to
complete this research endeavor and provided me with what I consider my initial, in-
depth exposure to classification methods. Lastly, I would like to thank my remaining
Core Committee members and Defense Chair, Dr. Lisa L. Weyandt, Dr. Judy A. Van
Wyk, and Dr. Diane C. Martins, for their invaluable contribution of time, feedback,
and moderation.
5. iv
TABLE OF CONTENTS
ABSTRACT................................................................................................ ii
ACKNOWLEDGMENTS ........................................................................ iii
TABLE OF CONTENTS ......................................................................... iv
LIST OF TABLES......................................................................................v
LIST OF FIGURES .................................................................................. vi
CHAPTER 1................................................................................................1
INTRODUCTION ..................................................................................1
CHAPTER 2................................................................................................9
REVIEW OF THE LITERATURE ........................................................9
CHAPTER 3..............................................................................................12
METHODOLOGY ...............................................................................12
CHAPTER 4..............................................................................................22
RESULTS .............................................................................................22
CHAPTER 5..............................................................................................31
CONCLUSION.....................................................................................31
APPENDICES...........................................................................................35
BIBLIOGRAPHY.....................................................................................48
6. v
LIST OF TABLES
TABLE PAGE
Table 1. Subsample Demographics by Size, Percentage of Subsample, and 13
Cluster Indicator Means and Standard Deviations
Table 2. Correlation Between Cluster Indicators (N=3,493) 17
Table 3. Test Results for Univariate and Multivariate Normal Distributions 17
for Raw Indicators
Table 4. Test Results for Univariate and Multivariate Normal Distributions 18
for Ln Transformed Indicators
Table 5. Summary of Ln Transformed LPA Information Criteria and Group 25
Size Estimates Across Subsamples
Table 6. Agreement Between Ward’s Method and LPA Ln Transformed 37
4-Cluster Solutions
Table 7. Cross-Tabulation of Descriptive Statistics for Subsample 1 Latent 43
Profile by Ward’s Method Clusters Across Cluster Indicators: Pros
of Smoking (P), Cons of Smoking (C), and Temptations to Smoke (T)
Table 8. Cross-Tabulation of Descriptive Statistics for Subsample 2 Latent 44
Profile by Ward’s Method Clusters Across Cluster Indicators: Pros
of Smoking (P), Cons of Smoking (C), and Temptations to Smoke (T)
Table 9. Cross-Tabulation of Descriptive Statistics for Subsample 3 Latent 45
Profile by Ward’s Method Clusters Across Cluster Indicators: Pros
of Smoking (P), Cons of Smoking (C), and Temptations to Smoke (T)
Table 10. Cross-Tabulation of Descriptive Statistics for Subsample 4 Latent 46
Profile by Ward’s Method Clusters Across Cluster Indicators: Pros
of Smoking (P), Cons of Smoking (C), and Temptations to Smoke (T)
Table 11. Cross-Tabulation of Descriptive Statistics for Subsample 5 Latent 47
Profile by Ward’s Method Clusters Across Cluster Indicators: Pros
of Smoking (P), Cons of Smoking (C), and Temptations to Smoke (T)
8. 1
CHAPTER 1
INTRODUCTION
Clustering and classification methods are widely used in the fields of engineering,
the biological sciences, and more recently the behavioral sciences for one purpose: to
explore and formulate a better understanding of underlying homogeneity within a
heterogeneous population of interest. Despite this commonality researchers using these
statistical procedures seldom agree on which of the many available clustering algorithms
will yield the most reliable and valid results. This is no doubt due partly to the lack of a
common mathematical background held by users, who, differences aside, would most
likely agree decisions pertaining to which clustering algorithm to use should be data-
driven. To complicate things further is it is not always the case that a single method is
the only one that can be applied to a particular set of data. In situations such as this,
researchers dealing with non-simulation data may find themselves with little guidance on
selection of an optimal clustering procedure.
In this thesis this scenario is explored in depth through the comparison of two
clustering algorithms (e.g., hierarchical Ward’s method cluster analysis and latent profile
analysis) applied to the same, real-world data. The objective was to develop a better
understanding of the unique contributions of LPA and Ward’s method cluster analysis in
exploring the heterogeneity of potential youth smokers through comparison of method
results in a more objective and comprehensive manner than seen in previous studies (see