- OBSERVED TREND For Research Question 1: According to the data, there is an opioid crisis in America. Opioid related deaths increased by more than an order of magnitude in 17 years. (1999: 2.0 deaths per 100,000 population; 2016: 12.6 deaths per 100,000 population.) By comparison, there was a negligable increase in anaesthesia deaths during the same period. Males are almost twice as likely to be affected as females. The age range of 25 - 54 years is most likely to be affected by the crisis. White people are twice as likely to be affected as black people; indigenous peoples are also heavily affected, while asian and pacific islanders are least affected by the crisis. The crisis is distributed fairly evenly throughout the continental United States, although there is a band in the geographic center of the country that is relatively unaffected. Both coasts of the country are heavily affected. Some notable hot-spots include West Virginia, the New England region, Oklahoma, parts of New Mexico, Utah, and Nevada, and the Pacific Northwest. Southern Alaska is affected, while Hawaii is only moderately affected.
- OBSERVED TREND For Research Question 2: CMS Medicare Part D Opioid Prescriber data reflects the change in opioid prescribing rates from 2013 to 2015 at the at the state level. At a high level, the majority of states see a decrease in opioid prescribing rates and an increase in extended-release rates over the three year period. Wyoming is the only state to see an increase in opioid prescribing (0.23) and extended release prescribing (0.08). There are a few states that saw a decrease in both opioid prescribing and extended release prescribing: District of Columbia (-0.21 and -0.04), New Mexico (- 0.04 and -0.09), Oregon ( -0.30 and -0.15), and Washington ( -0.07 and -0.05)
- OBSERVED TREND For Research Question 3: Looking at 2010 census data of urban population percentage for each state and the KFF analysis of CDC opioid overdose deaths, there seems to be a significant increase in the change in opioid deaths the more rural a state is in that time period (p-value = 0.003). Opioid overdose deaths are used as a proxy for opioid usage in addition as a measure of the negative impact of opioid abuse.
It is a common assumption that crime and drugs are tightly correlated, as had been the case previously with heroin in the 1970s and crack in the 1980s. I test this association to see if it carries over to the current opioid epidemic. It seems that unlike the previous drug epidemics, the impact on crime seems to be negligible. Using the previous opioid death data and FBI uniform crime report data, changes from 1999-2014 in violent crime rates, robbery rates, and property crime rates each did not seem to have a significant impact on 2014 opioid death rates (p-values are 0.776, 0.513, 0.283 respectively). - OBSERVED TREND For Research Question 4: Total CMS Spending increased from 398,350 to 1,254,525 million (214% increase). In this same time period National Health Expenditures also increased from 1,277,700 to 3,337,248 million (160% increase). Even though all National Health Expenditure costs increased, the percentage of those costs that were CMS increased from 31 to 37%.
Presentation Link: https://docs.google.com/presentation/d/e/2PACX-1vQqFIFLN2oR6E-hy-oEYjMyUquZUFxDQNi7EDebgkT-ytnWOijaL77kkB03nuBC1IDN-k-fZZ9sDYRP/pub?start=true&loop=false&delayms=60000
Developer caveat for Question 6 data set 6 (map.) Geoplotlib does not have a direct analog for matplotlib.savefig. Screenshots (cntl-p in running map window) of the generated maps are available in the Output folder named Q1_Plot6a, b, and c.png. See requirements.txt for geoplotlib dependencies.
Developer caveat for warnings generated by statsmodel.api; Warnings are not important for this usage, as the only part deprecated is for timestamping generated output.
import os # os library
import numpy as np # numpy library
import pandas as pd # pandas library
import matplotlib.pyplot as plt # pyplot module from matplotlib library
import geoplotlib # geoplotlib library
from geoplotlib.utils import BoundingBox # BoundingBox module from geoplotlib.utils
from geoplotlib.colors import ColorMap # ColorMap module from geoplotlib.colors
import seaborn as sns # seaborn library
import json # json library
import statsmodels.api as sm # to use regressions
C:\Users\fcardwell\AppData\Local\Continuum\anaconda3\envs\PythonData\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
sns.set() # switches to seaborn default display
Research Question 1: Does the crisis exist? If so, what is it's magnitude? Who is affected? Where is it occuring?
filename = 'Q1DS1.csv' # 1st data file for Q1
csv_file = os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
q1ds1_df = pd.read_csv(csv_file, index_col="Year") # reads data from file
q1ds1_df.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Year Code | Deaths | Population | Death Rate | |
---|---|---|---|---|
Year | ||||
1999 | 1999 | 5594 | 279040168 | 2.004729 |
2000 | 2000 | 6011 | 281421906 | 2.135939 |
2001 | 2001 | 7088 | 284968955 | 2.487288 |
2002 | 2002 | 9318 | 287625193 | 3.239633 |
2003 | 2003 | 10389 | 290107933 | 3.581081 |
year_min = q1ds1_df.index.min() # finds min and max dates
year_max = q1ds1_df.index.max()
plt.figure(figsize = (17,10)) # sets bar chart parameters
plt.title('Opioid Deaths By Year: %s to %s' % (year_min, year_max), fontdict = {'fontsize': 20})
plt.xlabel('Year', fontdict = {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds1_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, q1ds1_df['Death Rate'], color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds1_df['Year Code'], rotation="horizontal")
([<matplotlib.axis.XTick at 0xdc76048>,
<matplotlib.axis.XTick at 0x3c2fa20>,
<matplotlib.axis.XTick at 0xdc5ada0>,
<matplotlib.axis.XTick at 0xdcf1ef0>,
<matplotlib.axis.XTick at 0xdcfe630>,
<matplotlib.axis.XTick at 0xdcfeda0>,
<matplotlib.axis.XTick at 0xdd02550>,
<matplotlib.axis.XTick at 0xdd02cc0>,
<matplotlib.axis.XTick at 0xdd08470>,
<matplotlib.axis.XTick at 0xdd08be0>,
<matplotlib.axis.XTick at 0xdd0f390>,
<matplotlib.axis.XTick at 0xdd0fb00>,
<matplotlib.axis.XTick at 0xdd142b0>,
<matplotlib.axis.XTick at 0xdd14a20>,
<matplotlib.axis.XTick at 0xdd191d0>,
<matplotlib.axis.XTick at 0xdd19940>,
<matplotlib.axis.XTick at 0xdd1d0f0>,
<matplotlib.axis.XTick at 0xdd1d860>],
<a list of 18 Text xticklabel objects>)
axes = plt.gca()
c1min, c1max = axes.get_ylim() # gets y limits, sanity check
plt.savefig('./Output/Q1_Plot1.png', bbox_inches = 'tight') # saves plot to file
plt.show() # displays plot
filename = 'Q1DS2.csv' # 2nd data file for Q1
csv_file = os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
q1ds2_df = pd.read_csv(csv_file, index_col="Year") # reads data from file
year_min = q1ds2_df.index.min() # finds min and max dates
year_max = q1ds2_df.index.max()
plt.figure(figsize = (17,10)) # sets bar chart parameters
plt.title('Anaesthesia Deaths By Year: %s to %s' % (year_min, year_max), fontdict = {'fontsize': 20})
plt.xlabel('Year', fontdict = {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds2_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, q1ds2_df['Death Rate'], color='b', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds2_df['Year Code'], rotation="horizontal")
axes = plt.gca()
axes.set_ylim([c1min,c1max]) # set y limits to match first chart
(0.0, 13.250775089523252)
plt.savefig('./Output/Q1_Plot2.png', bbox_inches = 'tight') # saves plot to file
plt.show() # displays plot
filename = 'Q1DS3.csv' # 3rd data file for Q1
csv_file = os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
q1ds3_df = pd.read_csv(csv_file) # reads data from file
plt.figure(figsize = (12,10)) # sets bar chart parameters
plt.title('Opioid Death Rate By Gender', fontdict = {'fontsize': 20})
plt.xlabel('Gender', fontdict = {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds3_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, q1ds3_df['Death Rate'], color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds3_df['Gender'], rotation="horizontal")
([<matplotlib.axis.XTick at 0xdf09278>, <matplotlib.axis.XTick at 0xdeafc88>],
<a list of 2 Text xticklabel objects>)
plt.savefig('./Output/Q1_Plot3.png', bbox_inches = 'tight') # saves plot to file
plt.show() # displays plot
filename = 'Q1DS4.csv' # 4th data file for Q1
csv_file = os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
q1ds4_df = pd.read_csv(csv_file) # reads data from file
plt.figure(figsize = (12,10)) # sets bar chart parameters
plt.title('Opioid Death Rate By Age Group', fontdict = {'fontsize': 20})
plt.xlabel('Ten-Year Age Groups', fontdict = {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds4_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, q1ds4_df['Death Rate'], color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds4_df['Ten-Year Age Groups'], rotation="horizontal")
([<matplotlib.axis.XTick at 0xe882c50>,
<matplotlib.axis.XTick at 0xe882160>,
<matplotlib.axis.XTick at 0xe88d668>,
<matplotlib.axis.XTick at 0xe477f98>,
<matplotlib.axis.XTick at 0xe4806d8>,
<matplotlib.axis.XTick at 0xe480e48>,
<matplotlib.axis.XTick at 0xe4855f8>,
<matplotlib.axis.XTick at 0xe485d68>,
<matplotlib.axis.XTick at 0xe489518>,
<matplotlib.axis.XTick at 0xe489c88>,
<matplotlib.axis.XTick at 0xe48f438>],
<a list of 11 Text xticklabel objects>)
plt.savefig('./Output/Q1_Plot4.png', bbox_inches = 'tight') # saves plot to file
plt.show() # displays plot
filename = 'Q1DS5.csv' # 5th data file for Q1
csv_file = os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
q1ds5_df = pd.read_csv(csv_file) # 5th data file for Q1
plt.figure(figsize = (12,10)) # sets bar chart parameters
plt.title('Opioid Death Rate By Race', fontdict = {'fontsize': 20})
plt.xlabel('Race', fontdict = {'fontsize': 16})
plt.ylabel('Deaths Per 100,000 Population', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds5_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, q1ds5_df['Death Rate'], color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds5_df['Race'], rotation="horizontal")
([<matplotlib.axis.XTick at 0xe669eb8>,
<matplotlib.axis.XTick at 0xe662f98>,
<matplotlib.axis.XTick at 0xe666ef0>,
<matplotlib.axis.XTick at 0xe697f28>],
<a list of 4 Text xticklabel objects>)
plt.savefig('./Output/Q1_Plot5.png', bbox_inches = 'tight') # saves plot to file
plt.show() # displays plot
filename = 'Q1DS6.json' # deaths data file for Q1
json_file_1 = os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
filename = 'gz_2010_us_050_00_20m.json' # map shape data file for Q1
json_file_2 = os.path.join(".", "Data Files", "Question_1", filename) # creates path to read data
# function finds the death rate for the selected county, and converts it to color
def get_color(properties):
key = str(int(properties['STATE'])) + properties['COUNTY']
if key in Q1DS6:
return cmap.to_color(Q1DS6.get(key), 5, 'lin')
else:
return [0, 0, 0, 0]
with open(json_file_1) as fin: # opens json file
Q1DS6 = json.load(fin)
cmap = ColorMap('Reds', alpha=255, levels=5) # displays map plot
geoplotlib.geojson(json_file_2, fill=True, color=get_color, f_tooltip=lambda properties: properties['NAME'])
geoplotlib.geojson(json_file_2, fill=False, color=[255, 255, 255, 64])
geoplotlib.set_bbox(BoundingBox.USA)
geoplotlib.show()
Change in Medicare Part D Opioid Prescribing Rates from 2013 to 2015 by State
# Took CMS Medicare Part D Opioid Prescribing Geographic 2013 2015 Excel file and created CSV files for the three taps 1) state 2) county and 3) zip
# Stored presciber-state file in a variable
prescriber_state = "Data Files/prescriber_state.csv"
prescriber_state_df = pd.read_csv(prescriber_state)
prescriber_state_df.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
State_Name | State_Abbreviation | State_FIPS | 2013_Part_ D_Prescribers | 2013_Part_Opioid_Prescribers | 2013_Opioid Claims | 2013_Extended_Release_Opioid_Claims | 2013_Overall_Claims | 2013_Opioid_Prescribing_Rate | 2013_Extended_Release_Opioid_Prescribing_Rate | ... | 2014_Extended_Release_Opioid_Prescribing_Rate | 2015_Part D Prescribers | 2015_Part D Opioid Prescribers | 2015_Opioid Claims | 2015_Extended Release Opioid Claims | 2015_Overall_Claims | 2015_Opioid_Prescribing_Rate | 2015_Extended_Release_Opioid_Prescribing_ Rate | 2013_2015_Change_in_Opioid_Prescribing_Rate | 2013_2015_Change_in_Extended_Release_Opioid_Prescribing_Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | National | NaN | NaN | 1,037,770 | 491,500 | 78,045,683 | 4,773,286 | 1,342,096,654 | 5.82 | 6.12 | ... | 6.27 | 1,090,979 | 496,383 | 78,372,855 | 5,257,481 | 1,418,618,875 | 5.52 | 6.71 | -0.30 | 0.59 |
1 | Alabama | AL | 1.0 | 12,820 | 7,422 | 2,260,284 | 118,523 | 29,160,952 | 7.75 | 5.24 | ... | 5.24 | 13,492 | 6,980 | 2,179,506 | 121,061 | 28,898,258 | 7.54 | 5.55 | -0.21 | 0.31 |
2 | Alaska | AK | 2.0 | 2,275 | 1,099 | 86,517 | 8,602 | 1,281,057 | 6.75 | 9.94 | ... | 10.44 | 2,342 | 1,067 | 89,275 | 9,509 | 1,339,671 | 6.66 | 10.65 | -0.09 | 0.71 |
3 | Arizona | AZ | 4.0 | 20,542 | 10,510 | 1,545,138 | 151,086 | 22,126,421 | 6.98 | 9.78 | ... | 9.78 | 21,833 | 10,992 | 1,649,600 | 169,215 | 24,315,477 | 6.78 | 10.26 | -0.20 | 0.48 |
4 | Arkansas | AR | 5.0 | 7,909 | 4,700 | 1,128,356 | 57,519 | 16,759,116 | 6.73 | 5.10 | ... | 5.17 | 8,327 | 4,714 | 1,132,475 | 63,261 | 17,786,636 | 6.37 | 5.59 | -0.36 | 0.49 |
5 rows × 26 columns
#Selected the opioid prescribe rate columsn for years 2013-2015
prescriber_state_rate_df = prescriber_state_df[["State_Name","State_Abbreviation","2013_Opioid_Prescribing_Rate","2013_Extended_Release_Opioid_Prescribing_Rate","2014_Opioid_Prescribing_Rate","2014_Extended_Release_Opioid_Prescribing_Rate","2015_Opioid_Prescribing_Rate","2015_Extended_Release_Opioid_Prescribing_ Rate","2013_2015_Change_in_Opioid_Prescribing_Rate","2013_2015_Change_in_Extended_Release_Opioid_Prescribing_Rate "]]
prescriber_state_df.fillna("National")
prescriber_state_df.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
State_Name | State_Abbreviation | State_FIPS | 2013_Part_ D_Prescribers | 2013_Part_Opioid_Prescribers | 2013_Opioid Claims | 2013_Extended_Release_Opioid_Claims | 2013_Overall_Claims | 2013_Opioid_Prescribing_Rate | 2013_Extended_Release_Opioid_Prescribing_Rate | ... | 2014_Extended_Release_Opioid_Prescribing_Rate | 2015_Part D Prescribers | 2015_Part D Opioid Prescribers | 2015_Opioid Claims | 2015_Extended Release Opioid Claims | 2015_Overall_Claims | 2015_Opioid_Prescribing_Rate | 2015_Extended_Release_Opioid_Prescribing_ Rate | 2013_2015_Change_in_Opioid_Prescribing_Rate | 2013_2015_Change_in_Extended_Release_Opioid_Prescribing_Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | National | NaN | NaN | 1,037,770 | 491,500 | 78,045,683 | 4,773,286 | 1,342,096,654 | 5.82 | 6.12 | ... | 6.27 | 1,090,979 | 496,383 | 78,372,855 | 5,257,481 | 1,418,618,875 | 5.52 | 6.71 | -0.30 | 0.59 |
1 | Alabama | AL | 1.0 | 12,820 | 7,422 | 2,260,284 | 118,523 | 29,160,952 | 7.75 | 5.24 | ... | 5.24 | 13,492 | 6,980 | 2,179,506 | 121,061 | 28,898,258 | 7.54 | 5.55 | -0.21 | 0.31 |
2 | Alaska | AK | 2.0 | 2,275 | 1,099 | 86,517 | 8,602 | 1,281,057 | 6.75 | 9.94 | ... | 10.44 | 2,342 | 1,067 | 89,275 | 9,509 | 1,339,671 | 6.66 | 10.65 | -0.09 | 0.71 |
3 | Arizona | AZ | 4.0 | 20,542 | 10,510 | 1,545,138 | 151,086 | 22,126,421 | 6.98 | 9.78 | ... | 9.78 | 21,833 | 10,992 | 1,649,600 | 169,215 | 24,315,477 | 6.78 | 10.26 | -0.20 | 0.48 |
4 | Arkansas | AR | 5.0 | 7,909 | 4,700 | 1,128,356 | 57,519 | 16,759,116 | 6.73 | 5.10 | ... | 5.17 | 8,327 | 4,714 | 1,132,475 | 63,261 | 17,786,636 | 6.37 | 5.59 | -0.36 | 0.49 |
5 rows × 26 columns
#Selected the opioid prescribe rate columsn for years 2013-2015
prescriber_state_rate_changeex_df = prescriber_state_df[["State_Name","2013_2015_Change_in_Opioid_Prescribing_Rate", "2013_2015_Change_in_Extended_Release_Opioid_Prescribing_Rate "]]
prescriber_state_rate_changeex_df.columns = ['State', 'Opioid', 'Extended Release']
prescriber_state_rate_changeex_df.set_index("State", inplace = True)
prescriber_state_rate_changeex_df.head()
prescriber_state_rate_change_za =prescriber_state_rate_changeex_df.sort_index(ascending=False)
prescriber_state_rate_change_za.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Opioid | Extended Release | |
---|---|---|
State | ||
Wyoming | 0.23 | 0.08 |
Wisconsin | -0.10 | 0.30 |
West Virginia | -0.73 | 0.33 |
Washington | -0.07 | -0.05 |
Virginia | -0.41 | 0.45 |
# Setting the positions and width for the bars
pos = list(range(len(prescriber_state_rate_df["State_Name"])))
width = 0.25
# Plotting the bars
fig, ax = plt.subplots(figsize=(12,10))
# Create a bar with pre_score data,
# in position pos,
plt.bar(pos,
#using df['pre_score'] data,
prescriber_state_rate_changeex_df["Opioid"],
# of width
width,
# with alpha 0.5
alpha=0.5,
# with color
color='r',
# with label the first value in first_name
label=prescriber_state_rate_changeex_df["Opioid"][0])
# Create a bar with mid_score data,
# in position pos + some width buffer,
plt.bar([p + width for p in pos],
#using df['mid_score'] data,
prescriber_state_rate_changeex_df["Extended Release"],
# of width
width,
# with alpha 0.5
alpha=0.5,
# with color
color='b',
# with label the second value in first_name
label= prescriber_state_rate_changeex_df["Extended Release"][1])
# Set the y axis label
ax.set_ylabel('Opioid Prescriber Rate')
#plt.ylabel("Prescribing Rate")
# Set the chart's title
ax.set_title("Change in Opioid Prescriber Rate 2013-2015", loc='center')
#plt.title("CMS Medicare Part D Opioid Prescribing Rates 2013-2015 by Sate")
# Set the position of the x ticks
ax.set_xticks([p + 1.5 * width for p in pos])
# Set the labels for the x ticks
ax.set_xticklabels(prescriber_state_rate_df["State_Name"], rotation = "vertical")
#plt.xlabel("State")
# Setting the x-axis and y-axis limits
plt.xlim(min(pos)-width, max(pos)+width*4)
plt.ylim([-0.8, max( prescriber_state_rate_changeex_df["Opioid"] + prescriber_state_rate_changeex_df["Extended Release"])] )
# Adding the legend and showing the plot
plt.legend(['Opioid Prescribing Rate', 'Extended Release Rate'], loc='upper right')
plt.grid()
plt.show() # displays plot
# Plot the lollipop plot
plt.figure(figsize=(16,10))
my_range=range(1,len(prescriber_state_rate_change_za.index)+1)
plt.hlines(y=my_range, xmin=prescriber_state_rate_change_za["Opioid"], xmax=prescriber_state_rate_change_za["Extended Release"], color='grey', alpha=0.4)
plt.scatter(prescriber_state_rate_change_za["Opioid"], my_range, color='skyblue', alpha=1, label='Opioid')
plt.scatter(prescriber_state_rate_change_za["Extended Release"], my_range, color='green', alpha=0.4 , label='Extended Release')
plt.legend()
# Add title and axis names
plt.yticks(my_range, prescriber_state_rate_change_za.index)
plt.title("Change in Opioid Prescriber Rate 2013-2015", loc='center')
plt.xlabel('Prescriber Rate')
plt.ylabel('State')
plt.show() # displays plot
Are urban areas impacted more by the opioid epidemic? Scatterplot of Urban vs. not urban
#importing csv files
csv_file = r'Data Files/Question_3/ChangeOpiodDeaths(00-10).csv'
OpDeathsDelta_df = pd.read_csv(csv_file)
# scatterplot: Urban population % by state in year 2010 on change in opioid death rates from years 2000-2010
sns.lmplot(x='2010', y='Change in Death', data = OpDeathsDelta_df, size = 8)
# set titles and labels
plt.title('Opioid Deaths and Urbanization (2000 - 2010)')
plt.xlabel('Urban population % by state in the Year 2010')
plt.ylabel('Change in Opioid Overdose Death Rate per 100,000')
plt.show()
# regression of Opioid Deaths on Urbanization (2000 - 2010)
y = OpDeathsDelta_df['Change in Death']
x = OpDeathsDelta_df['2010']
x = sm.add_constant(x)
results = sm.OLS(y, x).fit()
results.summary()
Dep. Variable: | Change in Death | R-squared: | 0.189 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.170 |
Method: | Least Squares | F-statistic: | 9.814 |
Date: | Mon, 08 Jan 2018 | Prob (F-statistic): | 0.00315 |
Time: | 23:05:27 | Log-Likelihood: | -117.55 |
No. Observations: | 44 | AIC: | 239.1 |
Df Residuals: | 42 | BIC: | 242.7 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 13.4391 | 2.893 | 4.645 | 0.000 | 7.600 | 19.278 |
2010 | -0.1168 | 0.037 | -3.133 | 0.003 | -0.192 | -0.042 |
Omnibus: | 30.212 | Durbin-Watson: | 2.013 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 79.175 |
Skew: | 1.735 | Prob(JB): | 6.42e-18 |
Kurtosis: | 8.580 | Cond. No. | 416. |
#importing csv file of overall crime
csv_file = r'Data Files/Question_3/crime_df.csv'
crime_df = pd.read_csv(csv_file)
#importing Opioid deaths df
csv_file1 = r'Data Files/Question_3/OpiodDeathsUrbanPop.csv'
OpDeaths_Urb_df = pd.read_csv(csv_file1)
# Scatterplot of change in violent crime and opioid deaths in 2014
plt.figure(figsize=(8,8))
# scatter with regression line.
sns.regplot(x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)'],\
y=crime_df['Violent Crime rate_change'])
#For some reason sns.lmplot doesn't work with data from two df's but sns.regplot does.
# set title and labels
plt.title('Change in Violent Crime Rates (1999-2014) and Opioid Deaths by State (2014)')
plt.xlabel('2014 Opioid Overdose Death Rate (Age-Adjusted)')
plt.ylabel('Change in Violent Crime Rates (per 100,000)')
plt.show()
# regression of Change in Violent Crime Rates (1999-2014) and Opioid Deaths by State (2014)
y = crime_df['Violent Crime rate_change']
x = OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)']
x = sm.add_constant(x)
results = sm.OLS(y, x).fit()
results.summary()
Dep. Variable: | Violent Crime rate_change | R-squared: | 0.002 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.019 |
Method: | Least Squares | F-statistic: | 0.08176 |
Date: | Mon, 08 Jan 2018 | Prob (F-statistic): | 0.776 |
Time: | 23:05:32 | Log-Likelihood: | -318.43 |
No. Observations: | 51 | AIC: | 640.9 |
Df Residuals: | 49 | BIC: | 644.7 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -84.5308 | 37.541 | -2.252 | 0.029 | -159.971 | -9.090 |
2014__Opioid Overdose Death Rate (Age-Adjusted) | -0.9056 | 3.167 | -0.286 | 0.776 | -7.270 | 5.459 |
Omnibus: | 0.363 | Durbin-Watson: | 1.962 |
---|---|---|---|
Prob(Omnibus): | 0.834 | Jarque-Bera (JB): | 0.291 |
Skew: | -0.175 | Prob(JB): | 0.865 |
Kurtosis: | 2.880 | Cond. No. | 25.1 |
# Scatterplot of change in robbery rates and opioid deaths in 2014
plt.figure(figsize=(8,8))
sns.regplot(x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)'],\
y=crime_df['Robbery rate_change'])
# set title and labels
plt.title('Change in Robbery Rates (1999-2014) and Opioid Deaths by State (2014)')
plt.xlabel('2014 Opioid Overdose Death Rate (Age-Adjusted)')
plt.ylabel('Change in Robbery Rates (per 100,000)')
plt.show()
# regression of Change in Robbery Rates (1999-2014) and Opioid Deaths by State (2014)
y = crime_df['Robbery rate_change']
x = OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)']
x = sm.add_constant(x)
results = sm.OLS(y, x).fit()
results.summary()
Dep. Variable: | Robbery rate_change | R-squared: | 0.009 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.011 |
Method: | Least Squares | F-statistic: | 0.4351 |
Date: | Mon, 08 Jan 2018 | Prob (F-statistic): | 0.513 |
Time: | 23:05:34 | Log-Likelihood: | -250.27 |
No. Observations: | 51 | AIC: | 504.5 |
Df Residuals: | 49 | BIC: | 508.4 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -37.9212 | 9.864 | -3.845 | 0.000 | -57.743 | -18.099 |
2014__Opioid Overdose Death Rate (Age-Adjusted) | 0.5489 | 0.832 | 0.660 | 0.513 | -1.123 | 2.221 |
Omnibus: | 7.046 | Durbin-Watson: | 1.952 |
---|---|---|---|
Prob(Omnibus): | 0.030 | Jarque-Bera (JB): | 6.576 |
Skew: | -0.875 | Prob(JB): | 0.0373 |
Kurtosis: | 3.178 | Cond. No. | 25.1 |
# Scatterplot of change in property crime rates and opioid deaths in 2014
plt.figure(figsize=(8,8))
sns.regplot(x=OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)'],\
y=crime_df['Property crime rate_change'])
# set title and labels
plt.title('Change in Property Crime Rates (1999-2014) and Opioid Deaths by State (2014)')
plt.xlabel('2014 Opioid Overdose Death Rate (Age-Adjusted)')
plt.ylabel('Change in Property Crime Rates (per 100,000)')
plt.show()
# regression of Change in property crime rates and Opioid Deaths by State (2014)
y = crime_df['Property crime rate_change']
x = OpDeaths_Urb_df['2014__Opioid Overdose Death Rate (Age-Adjusted)']
x = sm.add_constant(x)
results = sm.OLS(y, x).fit()
results.summary()
Dep. Variable: | Property crime rate_change | R-squared: | 0.024 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.004 |
Method: | Least Squares | F-statistic: | 1.180 |
Date: | Mon, 08 Jan 2018 | Prob (F-statistic): | 0.283 |
Time: | 23:05:35 | Log-Likelihood: | -382.17 |
No. Observations: | 51 | AIC: | 768.3 |
Df Residuals: | 49 | BIC: | 772.2 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -1248.6099 | 130.989 | -9.532 | 0.000 | -1511.842 | -985.377 |
2014__Opioid Overdose Death Rate (Age-Adjusted) | 12.0066 | 11.051 | 1.087 | 0.283 | -10.200 | 34.214 |
Omnibus: | 0.051 | Durbin-Watson: | 2.474 |
---|---|---|---|
Prob(Omnibus): | 0.975 | Jarque-Bera (JB): | 0.175 |
Skew: | -0.068 | Prob(JB): | 0.916 |
Kurtosis: | 2.748 | Cond. No. | 25.1 |
# read cleaned csv into df
filename = 'Q4DS1.csv' # 1st data file for Q4
csv_file = os.path.join(".", "Data Files", "Question_4", filename) # creates path to read data
q4ds1_df = pd.read_csv(csv_file, index_col="Expenditure Amount (Millions)") # reads data from file
q4ds1_df.head()
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
</style>
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
1960 | 1961 | 1962 | 1963 | 1964 | 1965 | 1966 | 1967 | 1968 | 1969 | ... | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Expenditure Amount (Millions) | |||||||||||||||||||||
Total National Health Expenditures | 27214 | 29138 | 31842 | 34595 | 38394 | 41852 | 46081 | 51565 | 58402 | 65923 | ... | 2295307 | 2399121 | 2495414 | 2598823 | 2689349 | 2797260 | 2879008 | 3026157 | 3200815 | 3337248 |
Total CMS Programs (Medicaid, CHIP and Medicare) | - | - | - | - | - | - | 3146 | 8065 | 9759 | 11220 | ... | 767613 | 821394 | 884386 | 928553 | 963412 | 1004951 | 1049032 | 1128749 | 1207734 | 1254525 |
POPULATION | 186 | 189 | 192 | 195 | 197 | 200 | 202 | 204 | 206 | 208 | ... | 301 | 304 | 306 | 309 | 311 | 313 | 316 | 318 | 320 | 322 |
3 rows × 57 columns
# Extract series of spending data 1999 - 2016 to compare against opioid deaths
cms_spending = q4ds1_df.loc['Total CMS Programs (Medicaid, CHIP and Medicare)','1999':'2016']
nhe_spending = q4ds1_df.loc['Total National Health Expenditures','1999':'2016']
percentage = cms_spending / nhe_spending
plt.figure(figsize = (17,10)) # sets bar chart parameters
plt.title('Total CMS Programs Expenditure: %s to %s' % (year_min, year_max), fontdict = {'fontsize': 20})
plt.xlabel('Year', fontdict = {'fontsize': 16})
plt.ylabel('Expenditure Amount (Millions)', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds1_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, cms_spending, color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds1_df['Year Code'], rotation="horizontal")
plt.show()
plt.figure(figsize = (17,10)) # sets bar chart parameters
plt.title('Total National Health Expenditures: %s to %s' % (year_min, year_max), fontdict = {'fontsize': 20})
plt.xlabel('Year', fontdict = {'fontsize': 16})
plt.ylabel('Expenditure Amount (Millions)', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds1_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, nhe_spending, color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds1_df['Year Code'], rotation="horizontal")
plt.show()
# plot CMS spending as percentage of NHE
plt.figure(figsize = (17,10)) # sets bar chart parameters
plt.title('Total CMS Programs Expenditure as a Percentage of NHE: %s to %s' % (year_min, year_max), fontdict = {'fontsize': 20})
plt.xlabel('Year', fontdict = {'fontsize': 16})
plt.ylabel('Expenditure Amount (Millions)', fontdict = {'fontsize': 16})
xvals = np.arange(len(q1ds1_df))
tick_locations = [value+0.4 for value in xvals]
plt.bar(xvals, cms_spending / nhe_spending, color='r', alpha=0.7, align="edge")
plt.xticks(tick_locations, q1ds1_df['Year Code'], rotation="horizontal")
plt.show()