1. 1
Correlating Factors of Firearm Fatalities in the United States
STAT 6509 Final Project
June 5th
, 2013
Group #3
Christina Huang
Juhie Shinn
Leah Lane
Jennifer Chen
James Sitkin
2. 2
Table of Contents
Correlating Factors of Firearm Fatalities in the United States ....................................................... 1
Table of Contents............................................................................................................................ 2
Introduction: The topic of gun control............................................................................................ 3
History: Guns in America ........................................................................................................... 3
Gun Statistics: US Domestic Figures.......................................................................................... 3
Gun Statistics: US vs. the World................................................................................................. 4
Graphs: There seems to be a correlation ..................................................................................... 4
Data and Sources:........................................................................................................................ 4
Variable Selection ....................................................................................................................... 5
Flow of project ............................................................................................................................ 5
Initial Check of Assumptions.......................................................................................................... 5
Analysis through plots of residuals ............................................................................................. 5
Box-Cox ...................................................................................................................................... 6
Multicollinearity ............................................................................................................................. 6
Pearson’s Correlation Matrix ...................................................................................................... 6
Checking Curvature Plots ............................................................................................................... 7
Model Selections and assumptions................................................................................................. 8
Backward Step-wise Selection.................................................................................................... 8
Plot of Residuals vs Predicted Values......................................................................................... 9
Check for Normality.................................................................................................................... 9
Breusch-Pagan............................................................................................................................. 9
Model Diagnostics: Looking for Outliers..................................................................................... 10
Studentized Deleted Residuals and Hat Matrix......................................................................... 10
DEFITS and Cook’s Distance................................................................................................... 11
Interpretation of the Results.......................................................................................................... 11
Interpretation of variables ......................................................................................................... 11
Conclusion................................................................................................................................. 12
Appendix A: SAS code................................................................................................................. 13
Appendix B: Sources .................................................................................................................... 15
3. 3
Introduction: The topic of gun control
The United States has been burdened with an assorted variety of gun-related crimes for many
decades. Although most of these incidents were disregarded as minor residual effects of being a
“Gun Country” recently, an appalling influx of major catastrophes involving firearms has
triggered a deeply divided national debate over gun control. On one side are the “gun-control
advocates” whom would like to reduce firearm fatalities by making guns less accessible.
Opposing them are the “gun-enthusiasts” whom do not wish to implement regulatory change on
grounds of constitutionality and culture. Both sides wield legitimate arguments deserving merit,
creating a complex dilemma susceptible to varying degrees of interpretation. Our group thinks
that an objective analysis utilizing Multiple Regression may help make an opaque situation more
transparent.
History: Guns in America
From the Revolutionary War to the Wild West, firearms have played a crucial role in American
history. The right to bear arms has been protected since 1791 by the 2nd
Amendment of the U.S.
Constitution which was drafted primarily on the notion of supporting citizen’s rights to self-
defense, resistance to oppression, and the civic duty to defend the state. This amendment was
made as a direct result of the colonialists picking up arms against the British allowing for the
U.S.’s liberation in The Revolutionary War. Later, during the 19th
century, firearms were a
necessity for protecting the westward-bound pioneers from wildlife and outlaws. Both factors
have made guns in the U.S. an inherent right for its citizens. Although gun laws vary from state
to state, currently, firearms can be legally purchased in all 50 states.
Gun Statistics: US Domestic Figures
30,000 gun deaths per year.1
70,000 gun injuries per year.1
NICS checks for gun purchases have doubled from 8 to 16 million in the past 10 years.1
In 2011 overall gun-related crimes was the lowest since 1981.2
Support for ban on handguns has decreased from 60% in 1959 to 26% in 2011.3
Around 40% of all legal gun sales involve private sellers and don't require background
checks. 40% of prison inmates who used guns in their crimes got them this way. 3
Among US gun owners:4
1. 62% own more than one gun
2. 74% own shot guns
3. 68% own hand guns
4. 17% own semi-automatic weapons
1
FBI Stats
2
Independent Journal Review, June 4, 2013
3
The Nation, Bureau of Justice
4
The United Nations of Drug Control (UNODC)
4. 4
Gun Statistics: US vs. the World
The US has the highest gun ownership rate in the world - an average of 88 per 100 people.
The #2 country, Yemen, has significantly fewer at 54 guns per 100 people.5
The rate of death from firearms in the US is 8 times higher than that in its economic
counterparts in other parts of the world.6
60% of U.S. homicides occur using a firearm, which is the 26th-highest rate in the
world.7
Graphs: There seems to be a correlation
Source 1: Centers for Disease Control and Prevention
Data and Sources:
For the purposes of analysis, we collected cross-sectional data that was collected from
approximately the same time period (annually) from a cross section of the U.S. population (by
state) from objective and reliable government agencies. Although tracking gun ownership was
tricky because of privacy laws, we followed the FBI’s method for tracking gun sales by dividing
the NICS checks by the state’s population to obtain approximate means. Although actual values
are much higher since registrants can buy multiple guns and not all states require guns to be
registered, for the purposes of our analysis we used values provided by the FBI for our
calculations. For independent variables such as percentage of very religious households per state,
we utilized data provided by Gallup Polls which are the most accepted sources for social
sciences. Other major sources of data that were utilized for this project include, The U.S. Census
Bureau and Dept. of Justice.
5
FBI Stats
6
The United Nations of Drug Control (UNODC)
7
The Nation, Bureau of Justice
5. 5
Variable Selection
After assessing the obtainable data, we decided to evaluate the following variables.
Y = Firearm Fatalities (per 100K people)
X1 = College Graduates (% of population)
X2 = Median Household Income (in $10K)
X3 = Religion (% of household “Very Religious”)
X4 = Gun Ownership (NICS per 100K people)
X5 = High School dropout rate (per 100K people)
X6 = Average Housing Prices (in $10K)
X7 = Law Enforcement Officers (per 100K people)
Flow of project
Using SAS allowed us to perform many calculations quickly and therefore were able to consider
various methods for analysis. After gathering all data, for consistency we decided on the
following steps for overall analysis of this project.
Step 1: Initial check for assumptions
Step 2: Check need for transformations (Box Cox)
Step 3: Check multicollinearity
Step 4: Check for curvature/possible need for 2nd order model
Step 5: Check interactions
Step 6: Select model (backward step-wise, AIC and adjusted R2
)
Step 7: Check for assumptions:
Constant variance of error (Breusch-Peagan)
Normality (p-values)
Outliers (Hat Matrix, DEFITS, Cook’s Distance)
Step8: Interpret Final Equation (CI’s, PI’s effects of interactions)
Initial Check of Assumptions
Analysis through plots of residuals
By creating various plots of residuals against independent variables, predicted variables we did
not find any patterns suggesting non-constant variance of error.
6. 6
Box-Cox
A Box-Cox calculation was performed on the Y variable to automatically identify if any
transformations were needed. The resulting λ value equaled 1 confirming that no transformations
were needed for our dependent variable Y.
Multicollinearity
After checking residual plots of Y= X1+X2+X3+X4+X5+X6+X7, we checked to see if the
independent variables were correlated to each other by running a correlation matrix. If
correlation values are above .80 it is conventional to drop the variable from the equation. First,
we checked Pearson Correlation Matrix to if there were any highly correlated predictors. The
correlation between X1 and X2 was 0.81262.
Pearson’s Correlation Matrix
6
Box-Cox
A Box-Cox calculation was performed on the Y variable to automatically identify if any
transformations were needed. The resulting λ value equaled 1 confirming that no transformations
were needed for our dependent variable Y.
Multicollinearity
After checking residual plots of Y= X1+X2+X3+X4+X5+X6+X7, we checked to see if the
independent variables were correlated to each other by running a correlation matrix. If
correlation values are above .80 it is conventional to drop the variable from the equation. First,
we checked Pearson Correlation Matrix to if there were any highly correlated predictors. The
correlation between X1 and X2 was 0.81262.
Pearson’s Correlation Matrix
6
Box-Cox
A Box-Cox calculation was performed on the Y variable to automatically identify if any
transformations were needed. The resulting λ value equaled 1 confirming that no transformations
were needed for our dependent variable Y.
Multicollinearity
After checking residual plots of Y= X1+X2+X3+X4+X5+X6+X7, we checked to see if the
independent variables were correlated to each other by running a correlation matrix. If
correlation values are above .80 it is conventional to drop the variable from the equation. First,
we checked Pearson Correlation Matrix to if there were any highly correlated predictors. The
correlation between X1 and X2 was 0.81262.
Pearson’s Correlation Matrix
7. 7
Because this correlation is very close to the 0.80 and we did not want to lose a hard-to-gain data
from our model, we tested the stability of the parameters and the standard error and see if the
high correlation between X1 and X2 is problematic. Below is a table of the parameters and
standard error of the parameters:
Variables in model b1 b2 s{b1} s{b2}
x1 -55.4746 4.995
x2 -0.29423 0.03322
x1, x2 -47.4629 -0.0599 8.5613 0.54014
x1, x2, x3 -45.6641 -0.04341 8.7183 0.05419
x1, x2, x3, x4 -43.9369 -0.04537 8.8998 0.05424
x1, x2, x3, x4, x5 -39.7062 -0.04135 8.6275 0.05226
x1, x2, x3, x4, x5, x6 -37.3318 -0.05502 8.7057 0.05266
x1, x2, x3, x4, x5, x6, x7 -41.4501 -0.0488 8.5789 0.05127
The estimate parameters fluctuated up and down, most likely caused by the high correlation
between X1 and X2. To make matters worse, the standard error of the parameters were increased
when X2 was added to the model. Therefore, these three factors of Pearson correlation value of
0.81, the unstable estimate parameters, and the inflated and unstable standard error of the
parameters provided ample evidence to conclude that X2 should be dropped from our list of
predictors.
Checking Curvature Plots
Upon inspection, we did notice possible curvature among a few of our variables, namely X4 and
X6 (we have already dropped X2):
8. 8
After centering X4 and X6 and adding a (centered and) squared term, we found that the
correlation coefficients (.93 and .83 between X4d/X4d^2 and X6d/X6d^2, respectively) were too
high to keep the squared terms in our model.
Model Selections and assumptions
After dropping X2, X4d^2, and X6d^2 because of high correlation values, we then checked the
remaining variables and found that X3 without X2 suffered from a low p-value. A partial f-test
confirmed that X3 no longer contributed to our model so we dropped this variable also.
Our equation then became: Y= X1 +X4+X5+X6+X7
Next, we tested to see if the interactions between the remaining variables existed by multiplying
each variable with the other. In our case there was a combination of 5 variables taken 2 at a time
for a total of 10 interaction terms.
This resulting equation was;
Y= X1 + X4 + X5 + X6 + X7
+ X1X4 + X1X5 + X1X6 + X1X7
+ X4X5 + X4X6 + X4X7
+ X5X6 + X5X7
+ X6X7
Backward Step-wise Selection
To minimize our equation, we performed backward step-wise regression to drop any unnecessary
terms. We also test to see if the slopes of the interaction terms equaled zero by running a
regression procedure using SAS. We were able to reduce our equation down to 10 terms.
The resulting equation was:
Y= X1 + X4 + X5 + X6 + X7 + X1X4 + X1X7 + X4X5 + X5X7 + X6X7
9. 9
Plot of Residuals vs Predicted Values
Once again, we plotted residuals against predicted values of Y to look for any systematic pattern
suggesting constant variance of error. As seen from the random pattern in graph below the
variation of error varies which we expected given our test results.
Check for Normality
We tested our final model for normality by using a univariate procedure in SAS to obtain a stem-
and-leaf plot and normal probability plot of residuals. From the bell shaped curve and linear line
we can see that our new equation meets assumptions of normality. Also from the given high p-
values of the tests from the graph below, we can again confirm normality.
Breusch-Pagan
We also checked our model for constancy of error variance with a Breusch-Pagan Test. As seen
from the results below, the p-value greater than 0.05 is consistent with constancy of error
variance. Thus, we conclude Homoscedasticity and move forward with this model
9
Plot of Residuals vs Predicted Values
Once again, we plotted residuals against predicted values of Y to look for any systematic pattern
suggesting constant variance of error. As seen from the random pattern in graph below the
variation of error varies which we expected given our test results.
Check for Normality
We tested our final model for normality by using a univariate procedure in SAS to obtain a stem-
and-leaf plot and normal probability plot of residuals. From the bell shaped curve and linear line
we can see that our new equation meets assumptions of normality. Also from the given high p-
values of the tests from the graph below, we can again confirm normality.
Breusch-Pagan
We also checked our model for constancy of error variance with a Breusch-Pagan Test. As seen
from the results below, the p-value greater than 0.05 is consistent with constancy of error
variance. Thus, we conclude Homoscedasticity and move forward with this model
9
Plot of Residuals vs Predicted Values
Once again, we plotted residuals against predicted values of Y to look for any systematic pattern
suggesting constant variance of error. As seen from the random pattern in graph below the
variation of error varies which we expected given our test results.
Check for Normality
We tested our final model for normality by using a univariate procedure in SAS to obtain a stem-
and-leaf plot and normal probability plot of residuals. From the bell shaped curve and linear line
we can see that our new equation meets assumptions of normality. Also from the given high p-
values of the tests from the graph below, we can again confirm normality.
Breusch-Pagan
We also checked our model for constancy of error variance with a Breusch-Pagan Test. As seen
from the results below, the p-value greater than 0.05 is consistent with constancy of error
variance. Thus, we conclude Homoscedasticity and move forward with this model
10. 10
Model Diagnostics: Looking for Outliers
For a final diagnostic measure, we performed four methods of analysis to look for potentially
problematic outliers. The result for all four test indicated that there were some outliers but not
enough to call for any corrective measures.
For a family significance level of α=0.05 and sample size of 150, t(0.9983, 138)=2.98. One
observation (MS state) was identified as outlier according to its studentized deleted residual.
With 2p/n=0.15 as a guide for identifying outlying according to their leverage values, some
observations were identified as outliers. To determine the influence of these outliers, their
DFFITs (outliers if the value is greater than one) and Cook’s distance (outliers if more than 50
percentile) were considered. In summary, some potential problems were identified, but none of
these was considered to be serious enough to require further remedial measures.
Studentized Deleted Residuals and Hat Matrix
11. 11
DEFITS and Cook’s Distance
Interpretation of the Results
Our final equation with coefficients is:
Y= -0.7729 + 4.09x1 - 0.0014x4 + 0.3481x5 + 0.2819x6 + 0.0962x7 + 0.0035x1x4 - 0.2167x1x7
+ 0.0002x4x5 - 0.0038x5x7 - 0.0009x6x7.
Interpretation of variables
X1: College Graduates
As College graduates (X1) ↑, Firearm Fatalities (Y) ↓
X4: Gun Ownership (NICS checks)
As Gun Ownership (X4) ↑, Firearm Fatalities (Y) ↑
Note: 3-odd observations: Y decreased in AL, ID, IN as X4 increased.
Possible reasons: Rejected NICS applications inhibited significant portion of population from
purchasing guns.
X5: High School Dropout Rate
As High School Dropout Rate (X5) ↑, Firearm Fatalities (Y) ↑
Note: 57-odd observations from 18 states. Although 3 states were outliers, 15 states had negative
relationships to Y
Possible reasons: For those states, it may be easier to obtain employment without high school
diploma allowing for more stable life styles?
X6: Average Housing Price was not linearly related to Y.
When total law enforcement (X7) < 313, meaning less officers and more expensive house,
firearm death rates increased.
Possible reasons: If unguarded expensive house becomes a criminal target.
12. 12
When total law enforcement (X7) >313, meaning more officers and more expensive houses,
firearm deaths decreased.
Possible reasons: More officers to protect nicer houses.
X7: Total full-time Law Enforcement Officers
As Number of police officers (X7) ↑, Firearm Fatalities (Y) ↑
This is because the increase in number of police officers is not the CAUSE of deaths by guns but
the RESULT.
When house prices (X6)=107, number of police officers have no effect on Y.
Possible reasons: if you live in an expensive house, no matter how many officers you have in that
area, it won't increase death rate by firearm.
Conclusion
As America’s trend in gun violence continues, it would be fair to say that the dilemma of
decreasing firearm fatalities while keeping intact our gun-heritage is dynamic and complicated. It
was interesting to see that this complexity was supported in our model, as we found it difficult to
make broad conclusions, even when strong associations were evident. This was noticeable
especially from our interaction terms where contradicting results among states contributed to
difficulty making overarching conclusions. Despite all the overlapping effects of interactions, we
can say though, that as more and more studies are published with sobering evidence suggesting a
strong correlation between firearm deaths and firearms, perhaps some sort of modification to our
gun culture may seem within reason.
In our study, the highest correlating factor to firearm fatalities was educational attainment in
form of college degrees and as education increased death cause by firearms decreased. But, since
it would be completely unrealistic to suggest that every state increase their amount college
educated residence, maybe education in another form could be considered. This could include
mandatory education for firearm safety or perhaps even tests for gun licenses similar to driving
licenses.
From working on this project, we learned that there is more than one way to analyze and create
regression models and seemingly subtle changes in procedure can often lead to significantly
different conclusions! This was also attributed to the fact that real life data is complex and often
challenging to work with. In the future, we look forward to learning the various other methods of
selecting regression models.
13. 13
Appendix A: SAS code
/* State: name of the state
Variables:
Y = Total firearm deaths (per 100K people of population)
X1 = College Graduates (of population)
X2 = Median income in $10K
X3 = % Very Religion
X4 = NICS checks (per 100K people of population)
X5 = High school dropout rate
X6 = Average house price in $10K
X7 = Total full-time law enforcement employees (per 100K people of
population) */
*libname ch 'e:STAT6509project';
libname ch 'f:STAT6509project';
data ch.proj;
set ch.firearms;
id=_n_;
y=y*100;
x2=x2/10000; x4=x4*100; x6=x6/10; x7=x7*100;
x1x2 = x1*x2; x1x3 = x1*x3; x1x4 = x1*x4; x1x5 = x1*x5; x1x6 = x1*x6; x1x7 =
x1*x7;
x2x3 = x2*x3; x2x4 = x2*x4; x2x5 = x2*x5; x2x6 = x2*x6; x2x7 = x2*x7;
x3x4 = x3*x4; x3x5 = x3*x5; x3x6 = x3*x6; x3x7 = x3*x7;
x4x5 = x4*x5; x4x6 = x4*x6; x4x7=x4*x7;
x5x6 = x5*x6; x5x7 = x5*x7;
x6x7 = x6*x7;
run;
proc univariate data=ch.proj normal plot;
var y;
run;
proc corr data=ch.proj; /* drop X2 due to high correlation */
var y x1 x2 x3 x4 x5 x6 x7;
run;
proc sgscatter data=ch.proj;
matrix y x1 x3 x4 x5 x6 x7;
run;
*Suggests lambda = 1 so no transformation necessary;
proc transreg data=ch.proj ss2 details;
model boxcox(y) = identity(x1 x3 x4 x5 x6 x7);
run;
/* model selections */
/* AdjRsq=0.53 */
proc reg data=ch.proj;
model y = x1 x3 x4 x5 x6 x7;
output out=myout;
run; quit; /* Drop X3 due to high p-value. */
/* select a first-order model with backward */
proc reg data=ch.proj;
model y = x1 x4 x5 x6 x7
x1x4 x1x5 x1x6 x1x7 x4x5 x4x6 x4x7 x5x6 x5x7 x6x7 /selection=backward
slentry=0.1 slstay=0.1;
run; quit;
proc reg data=ch.proj; /* AdjRsq=0.69 */
model y = x1 x4 x5 x6 x7 x1x4 x1x7 x4x5 x5x7 x6x7;
output out=ch.myout r=residual;
run; quit;
proc univariate data=ch.myout normal plot; /* Normality test, p-value=0.61 */
var residual;
14. 14
run;
/* Breusch/White test for heteroscedasticity, p-value=0.078 */
proc model data=ch.proj;
parms const inc1 inc2 inc3 inc4 inc5 inc6 inc7 inc8 inc9 inc10;
y = cost + inc1*x1 + inc2*x4 + inc3*x5 + inc4*x6 + inc5*x7 +
inc6*x1x4 + inc7*x1x7 + inc8*x4x5 + inc9*x5x7 + inc10*x6x7;
fit y / white breusch=(1 x1 x4 x5 x6 x7 x1x4 x1x7 x4x5 x5x7 x6x7);
run; quit;
/* select a first-order model with best model selection: same as backward selection */
proc reg data=ch.proj;
model y = x1 x4 x5 x6 x7
x1x4 x1x5 x1x6 x1x7 x4x5 x4x6 x4x7 x5x6 x5x7 x6x7 /selection=adjrsq cp
aic sbc b best=3;
run; quit;
/* identitfy outliers */
/* t(0.9983, 138)=2.981 */
/* full diagnostics */
proc reg data=ch.proj alpha=0.05;
model y = x1 x4 x5 x6 x7 x1x4 x1x7 x4x5 x5x7 x6x7 / influence;
output out=delres RStudent=ti H=hi cookd=cookd dffits=dffits;
run; quit;
proc print data=ch.delres; /* there is an outlier: MS */
where ti>3 or ti<-3;
run;
proc print data=delres;
where cookd >= 1;
run;
proc contents data=delres;
run;
symbol value=dot;
proc gplot data=delres;
plot ti*id;
plot hi*id;
plot dffits*id;
plot cookd*id;
run; quit;
/* select a first-order model with stepwise */
proc reg data=ch.proj;
model y = x1 x4 x5 x6 x7
x1x4 x1x5 x1x6 x1x7 x4x5 x4x6 x4x7 x5x6 x5x7 x6x7 /selection=stepwise
slentry=0.1 slstay=0.1;
run; quit;
proc reg data=ch.proj; /* AdjRsq=0.67 */
model y = x1 x4 x5 x7 x4x5 x4x7 x5x7;
output out=ch.myout r=residual;
run; quit;
proc univariate data=ch.myout normal plot; /* Normality test, p-value=0.99 */
var residual;
run;
/* Breusch/White test for heteroscedasticity, p-value=0.03 */
proc model data=ch.proj;
parms const inc1 inc2 inc3 inc4 inc5 inc6 inc7 inc8;
y = cost + inc1*x1 + inc2*x4 + inc3*x5 + inc4*x6 + inc5*x7 +
inc6*x4x5 + inc7*x4x7 +inc8*x5x7;
fit y / white breusch=(1 x1 x4 x5 x7 x4x5 x4x7 x5x7);
run; quit;
15. 15
Appendix B: Sources
1. Full-time Law Enforcement Employees by State, 2010
FBI website:
http://www.fbi.gov/about-us/cjis/ucr/crime-in-the-u.s/2010/crime-in-the-u.s.-
2010/tables/10tbl77.xls
2. Full-time Law Enforcement Employees by State, 2009
FBI website:
http://www.fbi.gov/about-us/cjis/ucr/crime-in-the-u.s/2009
3. Income of Household by state using 2-year-average medians
US Census Bureau: 2008-2009, 2009-2010, 2010-2011
http://www.census.gov/hhes/www/income/data/statemedian/
4. Year 2010 data is for
High school dropout numbers, 2009-2010, Table 4
National Center for Education Statistics website
http://nces.ed.gov/pubs2013/2013309rev.pdf
5. Year 2009 data is for
High school dropout number, 2008-2009, Table 4
http://nces.ed.gov/pubs2011/2011312.pdf
6. Year 2008 data is
High school dropout number, 2007-2008, Table 4
http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2010341
7. Unemployment rate 2010
Bureau of Labor Statistics website
http://www.bls.gov/lau/lastrk10.htm
8. Unemployment rate 2009
http://www.bls.gov/lau/lastrk09.htm
9. Unemployment rate 200
http://www.bls.gov/lau/lastrk08.htm
10. Total NICS Firearm Background Checks by State
FBI website
http://www.fbi.gov/about-us/cjis/nics/reports/nics-firearm-background-checks-
1998_2013_state_monthly_totals-033113.pdf
11. Existing home sales by the state 2008-2010:
http://www.census.gov/compendia/statab/cats/construction_housing/housing_sales.html