Abstract-Obesity is a continuing challenge for any town, city or country faced with this problem. Being obese increases your risk of physical disorders such as high blood pressure (BP), high blood cholesterol, diabetes, coronary heart disease, stroke, cancer and poor reproductive health. Higher obesity rates also leads to increased economic burden on society. In order to better understand and control obesity rates the in uence of various factors on its prevalence should be investigated. We used Ordinary Least Squares (OLS) and Geographically Weighted Regression (GWR) models to analyze spatial relationships using a combination of socio-economic and physical factor for counties in Pennsylvania (PA), USA for 2010. Our ndings suggest that the rate of obesity is impacted by local spatial variation and its prevalence positively correlated with diabetes, physical inactivity and the distance that a person must travel to get to a healthy food store. Additionally, GWR (AICc = 261.59; r-squared = 0.45) was found to signi cantly improve model tting over OLS (AICc = 299.87; r-squared = 0.34). These results indicate that additional factors, including social, cultural and behavioral, are needed to better explain the distribution of obesity rates across PA.
Healthy Food Accessibility and Obesity: Case Study of Pennsylvania, USA
1. Healthy Food Accessibility and Obesity: Case Study
of Pennsylvania, USA
Ranjay Shrestha
Earth System and Geoinformation Sciences, GMU
Center for Spatial Information Science and Systems
Fairfax, VA
rshresth@masonlive.gmu.edu
Ron Mahabir
Earth System and Geoinformation Sciences, GMU
Fairfax, VA
rmahabir@masonlive.gmu.edu
Abstract-Obesity is a continuing challenge for any town, city
or country faced with this problem. Being obese increases your
risk of physical disorders such as high blood pressure (BP), high
blood cholesterol, diabetes, coronary heart disease, stroke, cancer
and poor reproductive health. Higher obesity rates also leads to
increased economic burden on society. In order to better
understand and control obesity rates the influence of various
factors on its prevalence should be investigated. We used
Ordinary Least Squares (OLS) and Geographically Weighted
Regression (GWR) models to analyze spatial relationships using
a combination of socio-economic and physical factor for counties
in Pennsylvania (PA), USA for 2010. Our findings suggest that
the rate of obesity is impacted by local spatial variation and its
prevalence positively correlated with diabetes, physical inactivity
and the distance that a person must travel to get to a healthy food
store. Additionally, GWR (AICc = 261.59; r-squared = 0.45) was
found to significantly improve model fitting over OLS (AICc =
299.87; r-squared = 0.34). These results indicate that additional
factors, including social, cultural and behavioral, are needed to
better explain the distribution of obesity rates across PA.
Keywords-Obesity; Food Accessibility; Ordinal Least Squares;
Geographically Weighted Regression
A. What is obesity?
I. INTRODUCTION
According to the Word Health Organization (WHO) [16],
obesity is defined as abnormal or excessive fat accumulation
that presents a risk to an individuals' health. People are
considered obese when their Body Mass Index (BMI)
surpasses 30 kg/m2[8, 16].
B. Why is it important?
Increasingly, obesity has becomea major health risk
concern in both developed and developing countries [10, 12].
People with excessive weight are more likely to suffer various
physical disorders such as high blood pressure (BP), high
blood cholesterol, diabetes, coronary heart disease, stroke,
cancer andpoor reproductive health. Besides physical health,
obese people are also vulnerable to mental disorders such as
depression and eating irregularity [12]. In the United States
approximately 365,000 deaths per year are related to obesity,
only second to tobacco [1]. These high death rates are just a
LipingDi
Center for Spatial Science and Systems
Fairfax, VA
Idi@gmu.edu
small part of the problem. The major issue with higher obesity
ratesis the economic consequences they have on society. In
2000, the total direct and indirect cost due to people being
overweight and obese was about 117 billion US dollars [13].
As mentioned earlier, obese people are prone to various health
issues and for any town, city, or country an unhealthy
community presentsan economical strain. Majority of the
taxpayers' money will be spent towards healthcare and health
services, which otherwise could have been used for other
purposes to the benefit of the community.
C. Factors Influencing Obesity
Many studies have investigatedthe influence of various
socio-economic and physical factors on obesity. Such
information is important in order to better understand and
control the prevalence of the diseaseat both local and national
level. Grabner [10] in his research found a positive
correlation with education and health. People who are
educated are less likely to be obese compared to people
without education. Similar results were obtained by Kenkelet.
al [5] who found that individuals who completed high school
or GED were less likely to suffer from obesity. This makes
sense since educated individuals are most likely to understand
the role of a balance diet and nutrition values and are likely to
practice a healthier lifestyle.
Another important factor found related to obesity is
accessibility (or lack of it) to food option. Brennan &
Carpenter [4] looking at child obesity examined whether
having easy accessibility to fast food could influence weight
gain. Theresultsof this study showed astrong positive
relationship between fast food access and child obesity.
Furthermore, they (Brennan & Carpenter) suggested that it
would be useful to consider changes in school eating policies
to provide children alternatives to fast food access in schools.
This suggests that having easy accessibility to healthier food
optionsmay help in reducingobesity. Numerous other factors
have been found to influence the prevalence of obesity. These
have been known to vary with location [15] and behavioral
characteristics [6, 11] with research ongoing in these areas.
The main objectives of this research are (1) to determine
the impact of accessibility to healthy foods and other social
2. andeconomic factors on the prevalence of obesity
inPennsylvania, United States, and (2) to further determine the
influence of space in modeling obesity rates. In section II a
brief introduction of the study area, data, and tools utilized is
given. Section III explainsthe methods used to process and
analyze the data while results and analysis are discussed in
section IV. Finally, this paper concludes with final remarks
given in section V.
A. Study Area
II. STUDY AREA/DATA
The study area used in this research is the state of
Pennsylvania (PA), USA. Geographically it is located in the
Northeastern region of the United States. It is considered the
9th most densely populated and 6th most populated states in the
US [14]. The state of PA has total of 67 counties and all are
included within the scope of this research. Fig 1 shows the
geographic location and county divisionsfor statesin PA.
Fig. 1. Study Area- State of PA,USA
B. Data and Tools
Data on obesity rates and various factors which affect its
prevalence based on literature review were collected for PA
for 2010 (Table I). In order to preprocess and analyze these
datasets, the ArcGIS geographic information system software
suite (version 10.1) was utilized.
Obesity rates for counties in PA are shown Fig 2. This
figure shows that counties in the middle of the state are more
heavily impacted by obesity compared to counties in the
eastern and western parts of PA.
0.001 0.002 Kilometers
Legend
Obesity rates
D 21.20·24.24
D 24.24 - 27.28
_ 27.28 - 30.32
_ 30.32 - 33.36
_ 33.36 - 36.40
Fig. 2. Obesity Distribution in the State of PA - County Level
TABLET. DATASETS
Category Factors Data source Geographic
Scale
Health Obesity,Physical American County
Inactivity, FindTheData
Diabetes
Administrative State,County, Tiger Line Varies with
boundaries Census Tract dataset
Socioeconomic Total poverty, American County
Median household FactFinder
income,median
family income,
mean population
age,males,
females,Age 16 to
19 in school,Age
16 to 19not in
school.
Access to Healthy food Reference Point
healthy food stores locationsI USA locations
I HEALTHY FOOD STORES WERE IDENTIFIED AS GROCERY STORES AND FARMER
MARKETS
III. METHODS
A. Data Preprocessing
1) Aggregation to County level.·Health and socioeconomic
factors collected in tabular format were appended as aspatial
information to county polygons.
2) Food Stores Filtering: Only grocery stores which
provide healthier food options were included. Small convinent
and gas station food stores were excluded.
3) Average Nearest Distance: To determine food
accessibility the average nearest health food facility to the
centroid of each census tract in PA was calculated using the
Network Analyst Tool in ArcGIS. This information was then
used to calculate the average nearest distance travelled per
person for each county using Eq 1.
Dist_TraveLPerPopulationcounty
Where,
I(Disttrack * POptra ck)
TotPopcounty
()
Dist_Trave(PerPopulationcounty= Average network distance a
person has to travel to the nearest food store in a particular
county
Disttrack= Network Distance to the nearest food store for that
particular census track
POPtrack= Total population in that particular census track
TotPoPcounty= Total population within that particular county
B. Ordinal Least Squares Regression
The use of OLS was twofold. First it was used to remove
multicollinear relationships between variables. Second, it was
used to build and test the suitability of a non-spatial model
between obesity and explanatory factors. In order to remove
3. multicollinearity, variables were screened using a combination
of Variable Inflation Factor (VIF) and Variable Significance
(VS) values. A value greater than 7.5 for VIF was used to
suggest collinear participation [9]. The variable with the
highest VIF value greater than the given threshold was
removed and the OLS procedure was re-run. If two variables
both failed the VIF test and had similar VIF values these were
further evaluated based on their VS values. Variables with
higher VS values were kept since these reflect overall greater
model participation. Because the removal of one or more
explanatory variable impacts overall model variance, different
combination of variables were tested. This approach was used
to ensure that the best model was selected. Models' suitability
was evaluatedbased on a combination of Akaike's Information
Criterion (AICc) and r-squared values. Lower AICc and
higher r-squared values indicated an overall better model.
C. Spatial Autocorrelation
Moran's I was used to test for spatial autocorrelation using
the standardized residuals from the OLS modeland polygon
contiguity as the spatial relationship between observations.
This was necessary to ensure that there was no systematic
pattern indicating clustering and therefore an unsuitable or
bias model.
D. Geographically Weighted Regression (GWR)
GWR was used to build a model showing the impact of
local spatial variation of explanatory variables on obesity rates.
A fixed distance kernel using AICc to determine its spatial
extent was used to develop the final model.A fixed kernel was
chosen since the centroids of counties were used as
observations which because scales are expected to be relatively
stable across space. As it relates to AICc, this method chooses
the optimum bandwidth for the kernel based on tradeoff
between model bias and variance explained by the model.
IV. RESULTS AND ANALYSIS
Results of the findings and its analysis achieve from this
research are shown in following sections
A. Average distance travelled per person tofood store
The average nearest distance travelled per person to food
store is shown in Figure 3. This figure shows that persons
living in counties in the middle to northern parts of PA have to
travel distance upwards of 8km on average to get to the
nearest healthy food store.
0.001 0.002 Kilometers
I
Fig. 3. Average Nearest Distance to Food Stores
Legend
0<8
08-16
.16-24
.24-32
.>=32
B. Ordinal Least Squares Regression
The results of OLS produced a linear model with 3
explanatory variables accounting for most of the variance
observedfor obesity rates (Table II). As table II shows, all
explanatory variables show a positive relationship with the
rate of obesity with variables in decreasing order of influence
being diabetes, physical inactivity and average distance. The t
statistic and probability value suggests that the coefficient for
physical inactivity is statistically significant at the 95%
level.Whereas, low VIF values for explanatory variables
indicate all multicollinearity has been removed from this
model. Although other explanatory variables (diabetes and
average distance) as Table II shows were not statistically
significant, one of the objectives of this study was to compare
the influence of space on the distribution of obesity rates. It
was therefore necessary to keep all explanatory variables so
that comparison could be made between both spatial and non
spatial model outputs.
Variable
Intercept
Diabetes
Physical
Inactivity
Average
Distance
01
TABLE I!. OLS MoDEL
Coefficient t-Statistic Probability VIF
15.514312 5.674965 0 NA
0.51696 1.869992 0.066135 1.434994
0.325889 3.30219 0.001586 1.367493
0.000037 0.98089 0.330395 1.068323
II
- • �, I
. .
�
Fig. 4. Histogram of standardized residuals of OLS
Values for multiple r-squared and adjusted r-squared
were 0.34 and 0.31 respectively with a resulting AICc value of
299.87. These values suggest poor or less than optimum
model fit. Furthermore, the p-value for Koenker (BP) statistic
was 0.58 suggesting a stationary relationship between obesity
and explanatory factors. Moreover, the OLS model has a Joint
F-Statistic p-value of 0.000009 indicating that the model is
unbiased. This is also shown in Fig 4 which shows that the
standardized residuals of the model follow a normal
distribution (blue line).
4. C. Spatial Autocorrelation
!iillnif;c.nc.L.....!
(.....,;01...)
Fig. 5. Moran's 1
-2.�8- ·1.96
t=J ·1.9"- ·1.�
� Ti::L::-
The output of Moran's I is shown in Fig 5. Given the
z-score of 0,53, the pattern does not appear to be
significantly different than random. Fig 6 shows the
standardized distribution of residuals across PA counties.
This figure does not show any spatial pattern and therefore
agrees with the result of Moran's I. Additionally Fig 6
shows the model performs reasonably well with one
location, Columbia County in dark red, being under
predicted (standard deviation of residual > 2.5).
0.001 0.002 Kilometers
!
Fig. 6. Spatial distribution of OLS standardized residuals
D. Geographically Weighted Regression
The results of GWR for all observations are shown in
Table III, column 2. These results are similar to the output of
OLS (Table II). Fig 7 shows the spatial distribution of
residuals from GWR. This figure shows very similar results to
the residuals of OLS in Fig 6.
0.001 0.002 Kilomelers
!
Fig. 7. Spatial distribution of GWR Standardized Residuals (All
observations)
TABLE III. GWR OUTPUT
GWR GWR GWR Outlier +
All Outlier Clusters
observ removed removed
Parameter ations
Observations 67
66 61
Bandwidth 47.73
2.31 2.95
ResidualSquares 292.44
240.97 208.72
293.25 261.59
AICc 299.87
R2 0.34
0.44 0.45
R2Adjusted 0.31
0.35 0.40
Because of similar results between OLS and GWR cluster
and outlier analysis was performed using Anselin Local
Moran's I to test for local spatial autocorrelation using the
standardized residuals of GWR. Fig 8 shows the results of this
analysis with several areas identified as having local clustering
and outliers. Table III shows the results of re-running GWR
with outliers removed (column 3) followed by the removal of
clustered observations (column 4). This table shows improved
model parameters for AICc and r-squared (including adjusted
r-squared) with subsequent removal of observations identified
as problem areas. Evident from Table III also is that bandwidth
and sum of residuals squares also reduced indicating better
model fitting with increase local variation and moving away
from a model of global to local influence.
0.001 0.002 Kilometers
!
Fig. 8. Culter and Outlier Results
0.001 0.002 Kilometers
Legend
_Not Significant
_High-HghCluster
_ High-low Outlier
D Low-Higtl Outlier
_ Low-low Cluster
Fig. 9. Spatial Distribution of GWR Standardized Residuals (local clusters
and outliers removed)
5. V. CONCLUSION
Obesity continues to be a pressing problem affecting the
health and well being of persons worldwide. Especially
affected is the United States, which ranks 18th in the world [3].
The results of this study showed that obesity rates in PA is
impacted by various factors, in decreasing order of influence,
diabetes, physical inactivity and average distance to the
nearest healthy food store according to the OLS model. The
AICc and r-squared values for this model was 299.87 and 0.34
respectively. These values suggest that only about one third of
the variance in obesity rates can be explained using this
model. Similar results were found using GWR, which
subsequently led to an improved model (AICc =261.59; r
squared = 0.45) with the removal of local spatial clustering of
observations. These results indicate that GWR significantly
improves model fitting over OLS. Furthermore, although both
OLS and GWR accounted for low model variances (OLS =
34%; GWR = 45%), these results indicate that the rate of
obesity is impacted by local spatial variation.
The low variance of these models suggests that additional
factors are needed to better explain the distribution of obesity
rates across PA. This could be due to the limited coverage of
variables used which does not include factors such as social
(e.g. influence of people and surrounding environment),
cultural (e.g. cooking at home) and behavioral (e.g. habit of
eating healthy or fast food). Furthermore, because of the scale
of analysis of this study, county level, results may be
generalized and not reflect possible underlying local
variationoccurring between obesity and explanatory variables.
This issue will be addressed in future work using more
disaggregated datasets. Also, because obesity may be
impacted by changes in season (e.g. people may be more
likely to eat more during winter compared to summer since
being indoors for longer periods) this factor will also be
investigated. Finally, further analysis will incorporate the
distances to fast food outlets to examine their prevalence on
obesity rates since several studies have identified this to be a
contributing factor [2, 7].
REFERENCES
[I] A. H. Mokdad, 1. S. Marks, D. F. Stroup, and 1. L. Gerberding
"Correction:Actual Causes of Death in the United States, 2000,"
Journal of the American MedicalAssociation 293 (3),2005.
[2] B. Davis and C. Carpenter. "Proximity of Fast-Food Restaurants to
Schools and Adolescent Obesity ". American Journal ofPublic Health:
Vol. 99,No. 3,pp. 505-510,2009
[3] CIA, "Country Comparison :: Obesity - adult prevalence rate " The
World Factbook, Available online at
https://www.cia.gov/library/publications/the-world
factbooklrankorder/2228rank.html ,Last accessed September 6,2013.
[4] D. Brennan,and C. Carpenter, "Proximity of Fast-Food Restaurants to
Schools and Adolescent Obesity," American Joumal of Public Health,
Vol 99,No,3, March 2009.
[5] D. Kenkel, D. Lillard, and A. Mathios, 'The Roles of High School
Completion and GED Receipt in Smoking and Obesity," Journal of
Labor Economics,Vol 24,No 3,pp. 635-660, July 2006.
[6] D. Smith, K. Edwards, G. Clarke and K. Harland. "Geographies of
Obesity: Environmental Understandings of the Obesity Epidemic.
Chapter 13,Ashgate Publishing,UK,2010.
[7] D.A. Crawford,A. David A.,et al. "Neighbourhood fast food outlets and
obesity in children and adults: the CLAN Study. " lnternationaljournal
ofpediatric obesity 3(4),pp 249-256,2008.
[8] D.W. Haslam,and w. P. James WP, "Obesity," Lancet366 (9492),pp
1197-209,2005.
[9] ESRl, "Exploratory Regression (Spatial Statistics)," Available online at
http://resources.arcgis.com/en/help/main/IO.lIindex.htm1#//005pOOOOOO
50000000 Last accessed September 6,2013.
[10] M. 1. Grabner, "The Causal Effect of Education on Obesity: Evidence
from Compulsory Schooling Laws," JEL No. H2,120. 2008.
[II] M. White, " Food access and Obesity " Obesity Reviews,Vol 8,No. SI,
pp 99-107,2007.
[12] S. T. Yen,Z. Chen, and D. B. Eastwood, "Lifestyles, Demographics,
Dietary Behavior, and Obesity: A Switching Regression
Analysis,"Health Services Research 44:4,August 2009.
[13] U.S. Department of Health and Human Services,"Healthy People2010,"
Volume I and II. 2d Edition. Washington, DC: U.S. Government
Printing Office,2001.
[14] United States Census Bureau,"Annual Estimates of the Resident
Population for the United States,Regions,States,and Puerto Rico: April
I, 2010 to July I, 2011 ". 2011 Population Estimates., Population
Division. December 20II.
[IS] W, Tzai-Hung, C, Duan-Rung and T, Meng-Ju,"Identifying
geographical variations in poverty-obesity relationships: empirical
evidence from Taiwan ". Geospatial Health, 4 (2). pp. 257-265. 2010.
[16] WHO 2013, "Obesity " World health Organization. Available online at
http://www.who.intltopics/obesitv/en/ Last accessed August 2,2013.