Jupyter lab, Library: request, pandas, json, numpy, seaborn
Here we are going to see if there is a correlation between mortality of COVID-19 and number of hospital beds available in the U.S.. We are going to use API to access the data and join the two sets of data to plot.
Here we are going to do some exercise on getting Covid-19 data by using API, and we will make a plot with seaborn function within Jupyter notebook at the end.
import requests
import pandas as pd
import json
import numpy as np
req=requests.get("https://api.covid19api.com/live/country/united-states").json()
covid=pd.DataFrame(req)
covid.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24480 entries, 0 to 24479
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 24480 non-null object
1 Country 24480 non-null object
2 CountryCode 24480 non-null object
3 Province 24480 non-null object
4 City 24480 non-null object
5 CityCode 24480 non-null object
6 Lat 24480 non-null object
7 Lon 24480 non-null object
8 Confirmed 24480 non-null int64
9 Deaths 24480 non-null int64
10 Recovered 24480 non-null int64
11 Active 24480 non-null int64
12 Date 24480 non-null object
dtypes: int64(4), object(9)
memory usage: 2.4+ MB
covid.describe()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Confirmed | Deaths | Recovered | Active | |
---|---|---|---|---|
count | 2.448000e+04 | 24480.000000 | 24480.0 | 2.448000e+04 |
mean | 1.169896e+06 | 15378.652002 | 0.0 | 1.154518e+06 |
std | 1.522009e+06 | 18818.213672 | 0.0 | 1.504000e+06 |
min | 0.000000e+00 | 0.000000 | 0.0 | 0.000000e+00 |
25% | 2.508262e+05 | 2670.000000 | 0.0 | 2.485832e+05 |
50% | 7.225890e+05 | 9371.000000 | 0.0 | 7.122190e+05 |
75% | 1.468140e+06 | 19855.000000 | 0.0 | 1.450549e+06 |
max | 1.116657e+07 | 95311.000000 | 0.0 | 1.107126e+07 |
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
sns.relplot(data=covid,x="Lat", y="Lon", hue="Deaths", size="Confirmed", sizes=(40, 400))
<seaborn.axisgrid.FacetGrid at 0x7ff53588d670>
covid['Lat'][0]
'18.35'
covid['Lat'] = covid['Lat'].astype('float16')
covid['Lon'] = covid['Lon'].astype('float16')
covid.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24480 entries, 0 to 24479
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 24480 non-null object
1 Country 24480 non-null object
2 CountryCode 24480 non-null object
3 Province 24480 non-null object
4 City 24480 non-null object
5 CityCode 24480 non-null object
6 Lat 24480 non-null float16
7 Lon 24480 non-null float16
8 Confirmed 24480 non-null int64
9 Deaths 24480 non-null int64
10 Recovered 24480 non-null int64
11 Active 24480 non-null int64
12 Date 24480 non-null object
dtypes: float16(2), int64(4), object(7)
memory usage: 2.1+ MB
Now we can see that the data type of "Lat" and "Lon" has changed. Now we can make the plot correctly.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
fig=sns.relplot(data=covid,x="Lon", y="Lat", hue="Deaths", size="Confirmed", sizes=(40, 400))
fig=sns.relplot(data=covid,x="Lon", y="Lat", hue="Deaths", size="Confirmed", sizes=(40, 400)).set(ylim=(25,50),xlim=(-150,-50))
Now we are going to get another data from different source. We are going to merge these two data by using pandas, and we are going to make a plot with the merged values in the merged data with seaborn
req=requests.get("https://www.communitybenefitinsight.org/api/get_hospitals.php").json()
hospitals=pd.DataFrame(req)
hospitals.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
hospital_id | hospital_org_id | ein | name | name_cr | street_address | city | state | zip_code | fips_state_and_county_code | hospital_bed_count | chrch_affl_f | urban_location_f | children_hospital_f | memb_counc_teach_hosps_f | medicare_provider_number | county | hospital_bed_size | updated_dt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 630307951 | Mizell Memorial Hospital | Mizell Memorial Hospital | 702 Main Street | Opp | AL | 36462 | 01039 | 99 | N | N | N | N | 010007 | Covington County | <100 beds | June 15, 2022 |
1 | 2 | 2 | 630578923 | St Vincents East | St Vincents East | 50 Medical Park Drive East | Birmingham | AL | 35235 | 01073 | 362 | N | Y | N | Y | 010011 | Jefferson County | >299 beds | June 15, 2022 |
2 | 3 | 3 | 630312913 | Shelby Baptist Medical Center | Shelby Baptist Medical Center | 1000 First Street North | Alabaster | AL | 35007 | 01117 | 252 | N | Y | N | N | 010016 | Shelby County | 100-299 beds | June 15, 2022 |
3 | 4 | 4 | 630459034 | Callahan Eye Foundation Hosp | Callahan Eye Foundation Hosp | 1720 University Boulevard | Birmingham | AL | 35233 | 01073 | 106 | N | Y | N | Y | 010018 | Jefferson County | 100-299 beds | June 15, 2022 |
4 | 5 | 5 | 581973570 | Cherokee Medical Center | Cherokee Medical Center | 400 Northwood Drive | Centre | AL | 35960 | 01019 | 60 | N | N | N | N | 010022 | Cherokee County | <100 beds | June 15, 2022 |
The count is not unique because there are same name all over the states in the U.S.. Therefore, we are going to create a new column called "state_county".
hospitals['state_county']=hospitals['state']+'_'+hospitals['county']
hospitals.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
hospital_id | hospital_org_id | ein | name | name_cr | street_address | city | state | zip_code | fips_state_and_county_code | hospital_bed_count | chrch_affl_f | urban_location_f | children_hospital_f | memb_counc_teach_hosps_f | medicare_provider_number | county | hospital_bed_size | updated_dt | state_county | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 630307951 | Mizell Memorial Hospital | Mizell Memorial Hospital | 702 Main Street | Opp | AL | 36462 | 01039 | 99 | N | N | N | N | 010007 | Covington County | <100 beds | June 15, 2022 | AL_Covington County |
1 | 2 | 2 | 630578923 | St Vincents East | St Vincents East | 50 Medical Park Drive East | Birmingham | AL | 35235 | 01073 | 362 | N | Y | N | Y | 010011 | Jefferson County | >299 beds | June 15, 2022 | AL_Jefferson County |
2 | 3 | 3 | 630312913 | Shelby Baptist Medical Center | Shelby Baptist Medical Center | 1000 First Street North | Alabaster | AL | 35007 | 01117 | 252 | N | Y | N | N | 010016 | Shelby County | 100-299 beds | June 15, 2022 | AL_Shelby County |
3 | 4 | 4 | 630459034 | Callahan Eye Foundation Hosp | Callahan Eye Foundation Hosp | 1720 University Boulevard | Birmingham | AL | 35233 | 01073 | 106 | N | Y | N | Y | 010018 | Jefferson County | 100-299 beds | June 15, 2022 | AL_Jefferson County |
4 | 5 | 5 | 581973570 | Cherokee Medical Center | Cherokee Medical Center | 400 Northwood Drive | Centre | AL | 35960 | 01019 | 60 | N | N | N | N | 010022 | Cherokee County | <100 beds | June 15, 2022 | AL_Cherokee County |
HospitalInfoPerCounty=hospitals.groupby(['state_county'], as_index=False)['hospital_bed_count'].agg('sum')
HospitalInfoPerCounty.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state_county | hospital_bed_count | |
---|---|---|
0 | AK_Anchorage Municipality | 401 |
1 | AK_Fairbanks North Star Borough | 162 |
2 | AK_Kenai Peninsula Borough | 49622 |
3 | AK_Ketchikan Gateway Borough | 25 |
4 | AK_Kodiak Island Borough | 25 |
the number we have in AK_Kenai Peninsula Borough is too high comparing to the others. We need to check the data type of it.
hospitals.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3423 entries, 0 to 3422
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 hospital_id 3423 non-null object
1 hospital_org_id 3423 non-null object
2 ein 3423 non-null object
3 name 3423 non-null object
4 name_cr 3423 non-null object
5 street_address 3423 non-null object
6 city 3423 non-null object
7 state 3423 non-null object
8 zip_code 3423 non-null object
9 fips_state_and_county_code 3423 non-null object
10 hospital_bed_count 3423 non-null object
11 chrch_affl_f 3423 non-null object
12 urban_location_f 3423 non-null object
13 children_hospital_f 3423 non-null object
14 memb_counc_teach_hosps_f 3423 non-null object
15 medicare_provider_number 3423 non-null object
16 county 3422 non-null object
17 hospital_bed_size 3423 non-null object
18 updated_dt 3423 non-null object
19 state_county 3422 non-null object
dtypes: object(20)
memory usage: 535.0+ KB
hospitals['hospital_bed_count'] = hospitals['hospital_bed_count'].astype('int64')
hospitals.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3423 entries, 0 to 3422
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 hospital_id 3423 non-null object
1 hospital_org_id 3423 non-null object
2 ein 3423 non-null object
3 name 3423 non-null object
4 name_cr 3423 non-null object
5 street_address 3423 non-null object
6 city 3423 non-null object
7 state 3423 non-null object
8 zip_code 3423 non-null object
9 fips_state_and_county_code 3423 non-null object
10 hospital_bed_count 3423 non-null int64
11 chrch_affl_f 3423 non-null object
12 urban_location_f 3423 non-null object
13 children_hospital_f 3423 non-null object
14 memb_counc_teach_hosps_f 3423 non-null object
15 medicare_provider_number 3423 non-null object
16 county 3422 non-null object
17 hospital_bed_size 3423 non-null object
18 updated_dt 3423 non-null object
19 state_county 3422 non-null object
dtypes: int64(1), object(19)
memory usage: 535.0+ KB
We are going to create a data frame that is unique by "state_county", and we are going to sum across entries on "hospital_bed_count".
HospitalInfoPerCounty=hospitals.groupby(['state_county'], as_index=False)['hospital_bed_count'].agg('sum')
HospitalInfoPerCounty.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state_county | hospital_bed_count | |
---|---|---|
0 | AK_Anchorage Municipality | 401 |
1 | AK_Fairbanks North Star Borough | 162 |
2 | AK_Kenai Peninsula Borough | 77 |
3 | AK_Ketchikan Gateway Borough | 25 |
4 | AK_Kodiak Island Borough | 25 |
HospitalInfoPerCounty['state_county']=HospitalInfoPerCounty['state_county'].str.replace(' County','')
HospitalInfoPerCounty.head(5)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state_county | hospital_bed_count | |
---|---|---|
0 | AK_Anchorage Municipality | 401 |
1 | AK_Fairbanks North Star Borough | 162 |
2 | AK_Kenai Peninsula Borough | 77 |
3 | AK_Ketchikan Gateway Borough | 25 |
4 | AK_Kodiak Island Borough | 25 |
import requests
import pandas as pd
import json
import numpy as np
req=requests.get("https://webhooks.mongodb-stitch.com/api/client/v2.0/app/covid-19-qppza/service/REST-API/incoming_webhook/us_only?min_date=2021-01-15T00:00:00.000Z&max_date=2021-01-15T00:00:00.000Z").json()
covid=pd.DataFrame(req)
covid.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
_id | uid | country_iso2 | country_iso3 | country_code | fips | county | state | country | combined_name | population | loc | date | confirmed | deaths | confirmed_daily | deaths_daily | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6335197fb2fc4a05475fedec | 84001001 | US | USA | 840 | 1001.0 | Autauga | Alabama | US | Autauga, Alabama, US | 55869.0 | {'type': 'Point', 'coordinates': [-86.6441, 32... | 2021-01-15T00:00:00.000Z | 5103 | 55 | 28 | 0 |
1 | 6335197fb2fc4a05475ff1c0 | 84001003 | US | USA | 840 | 1003.0 | Baldwin | Alabama | US | Baldwin, Alabama, US | 223234.0 | {'type': 'Point', 'coordinates': [-87.7221, 30... | 2021-01-15T00:00:00.000Z | 16002 | 179 | 161 | 2 |
2 | 6335197fb2fc4a05475ff594 | 84001005 | US | USA | 840 | 1005.0 | Barbour | Alabama | US | Barbour, Alabama, US | 24686.0 | {'type': 'Point', 'coordinates': [-85.3871, 31... | 2021-01-15T00:00:00.000Z | 1712 | 36 | 16 | 0 |
3 | 6335197fb2fc4a05475ff968 | 84001007 | US | USA | 840 | 1007.0 | Bibb | Alabama | US | Bibb, Alabama, US | 22394.0 | {'type': 'Point', 'coordinates': [-87.1251, 32... | 2021-01-15T00:00:00.000Z | 2130 | 47 | 17 | 0 |
4 | 6335197fb2fc4a05475ffd3c | 84001009 | US | USA | 840 | 1009.0 | Blount | Alabama | US | Blount, Alabama, US | 57826.0 | {'type': 'Point', 'coordinates': [-86.5679, 33... | 2021-01-15T00:00:00.000Z | 5264 | 83 | 45 | 3 |
We are going to convert full states name under "state" column to two letter code. In order to do that, we are going to make a library and replace states name by using the library. Any state name matches with its key in the library replace to values in the library.
us_state_to_abbrev = {
"Alabama": "AL","Alaska": "AK","Arizona": "AZ","Arkansas": "AR", "California": "CA",
"Colorado": "CO","Connecticut": "CT","Delaware": "DE","Florida": "FL","Georgia": "GA",
"Hawaii": "HI","Idaho": "ID","Illinois": "IL","Indiana": "IN","Iowa": "IA","Kansas": "KS",
"Kentucky": "KY","Louisiana": "LA","Maine": "ME","Maryland": "MD","Massachusetts": "MA",
"Michigan": "MI","Minnesota": "MN","Mississippi": "MS","Missouri": "MO","Montana": "MT",
"Nebraska": "NE","Nevada": "NV","New Hampshire": "NH","New Jersey": "NJ","New Mexico": "NM",
"New York": "NY","North Carolina": "NC","North Dakota": "ND","Ohio": "OH",
"Oklahoma": "OK","Oregon": "OR","Pennsylvania": "PA","Rhode Island": "RI",
"South Carolina": "SC","South Dakota": "SD","Tennessee": "TN","Texas": "TX",
"Utah": "UT","Vermont": "VT","Virginia": "VA","Washington": "WA",
"West Virginia": "WV","Wisconsin": "WI","Wyoming": "WY",
"District of Columbia": "DC","American Samoa": "AS","Guam": "GU", "Northern Mariana Islands": "MP",
"Puerto Rico": "PR","United States Minor Outlying Islands": "UM","U.S. Virgin Islands": "VI",
}
covid['state'].replace(us_state_to_abbrev, inplace=True)
covid.head(5)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
_id | uid | country_iso2 | country_iso3 | country_code | fips | county | state | country | combined_name | population | loc | date | confirmed | deaths | confirmed_daily | deaths_daily | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6335197fb2fc4a05475fedec | 84001001 | US | USA | 840 | 1001.0 | Autauga | AL | US | Autauga, Alabama, US | 55869.0 | {'type': 'Point', 'coordinates': [-86.6441, 32... | 2021-01-15T00:00:00.000Z | 5103 | 55 | 28 | 0 |
1 | 6335197fb2fc4a05475ff1c0 | 84001003 | US | USA | 840 | 1003.0 | Baldwin | AL | US | Baldwin, Alabama, US | 223234.0 | {'type': 'Point', 'coordinates': [-87.7221, 30... | 2021-01-15T00:00:00.000Z | 16002 | 179 | 161 | 2 |
2 | 6335197fb2fc4a05475ff594 | 84001005 | US | USA | 840 | 1005.0 | Barbour | AL | US | Barbour, Alabama, US | 24686.0 | {'type': 'Point', 'coordinates': [-85.3871, 31... | 2021-01-15T00:00:00.000Z | 1712 | 36 | 16 | 0 |
3 | 6335197fb2fc4a05475ff968 | 84001007 | US | USA | 840 | 1007.0 | Bibb | AL | US | Bibb, Alabama, US | 22394.0 | {'type': 'Point', 'coordinates': [-87.1251, 32... | 2021-01-15T00:00:00.000Z | 2130 | 47 | 17 | 0 |
4 | 6335197fb2fc4a05475ffd3c | 84001009 | US | USA | 840 | 1009.0 | Blount | AL | US | Blount, Alabama, US | 57826.0 | {'type': 'Point', 'coordinates': [-86.5679, 33... | 2021-01-15T00:00:00.000Z | 5264 | 83 | 45 | 3 |
covid['state_county']=covid['state']+'_'+covid['county']
covid.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
_id | uid | country_iso2 | country_iso3 | country_code | fips | county | state | country | combined_name | population | loc | date | confirmed | deaths | confirmed_daily | deaths_daily | state_county | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6335197fb2fc4a05475fedec | 84001001 | US | USA | 840 | 1001.0 | Autauga | AL | US | Autauga, Alabama, US | 55869.0 | {'type': 'Point', 'coordinates': [-86.6441, 32... | 2021-01-15T00:00:00.000Z | 5103 | 55 | 28 | 0 | AL_Autauga |
1 | 6335197fb2fc4a05475ff1c0 | 84001003 | US | USA | 840 | 1003.0 | Baldwin | AL | US | Baldwin, Alabama, US | 223234.0 | {'type': 'Point', 'coordinates': [-87.7221, 30... | 2021-01-15T00:00:00.000Z | 16002 | 179 | 161 | 2 | AL_Baldwin |
2 | 6335197fb2fc4a05475ff594 | 84001005 | US | USA | 840 | 1005.0 | Barbour | AL | US | Barbour, Alabama, US | 24686.0 | {'type': 'Point', 'coordinates': [-85.3871, 31... | 2021-01-15T00:00:00.000Z | 1712 | 36 | 16 | 0 | AL_Barbour |
3 | 6335197fb2fc4a05475ff968 | 84001007 | US | USA | 840 | 1007.0 | Bibb | AL | US | Bibb, Alabama, US | 22394.0 | {'type': 'Point', 'coordinates': [-87.1251, 32... | 2021-01-15T00:00:00.000Z | 2130 | 47 | 17 | 0 | AL_Bibb |
4 | 6335197fb2fc4a05475ffd3c | 84001009 | US | USA | 840 | 1009.0 | Blount | AL | US | Blount, Alabama, US | 57826.0 | {'type': 'Point', 'coordinates': [-86.5679, 33... | 2021-01-15T00:00:00.000Z | 5264 | 83 | 45 | 3 | AL_Blount |
We will create another data frame with columns that we desire to be within it and we are going to sum across entries on those columns.
CovidPerCounty=covid.groupby(['state_county'], as_index=False).agg({'deaths':'sum','confirmed':'sum','population':'sum','confirmed_daily':'mean','deaths_daily':'mean'})
CovidPerCounty.head(15)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state_county | deaths | confirmed | population | confirmed_daily | deaths_daily | |
---|---|---|---|---|---|---|
0 | AK_Aleutians East | 0 | 40 | 3337.0 | 1.0 | 0.0 |
1 | AK_Aleutians West | 0 | 242 | 5634.0 | 3.0 | 0.0 |
2 | AK_Anchorage | 129 | 24530 | 288000.0 | 92.0 | 0.0 |
3 | AK_Bethel | 12 | 2911 | 18386.0 | 26.0 | 0.0 |
4 | AK_Bristol Bay | 0 | 0 | 836.0 | 0.0 | 0.0 |
5 | AK_Bristol Bay plus Lake and Peninsula | 0 | 163 | 2428.0 | 1.0 | 0.0 |
6 | AK_Chugach | 3 | 241 | 6751.0 | 0.0 | 0.0 |
7 | AK_Copper River | 0 | 175 | 2699.0 | 1.0 | 0.0 |
8 | AK_Denali | 0 | 63 | 2097.0 | 0.0 | 0.0 |
9 | AK_Dillingham | 1 | 150 | 4916.0 | 1.0 | 0.0 |
10 | AK_Fairbanks North Star | 23 | 5576 | 96849.0 | 46.0 | 1.0 |
11 | AK_Haines | 0 | 23 | 2530.0 | 0.0 | 0.0 |
12 | AK_Hoonah-Angoon | 0 | 0 | 2148.0 | 0.0 | 0.0 |
13 | AK_Juneau | 5 | 1098 | 31974.0 | 4.0 | 0.0 |
14 | AK_Kenai Peninsula | 19 | 3821 | 58708.0 | 11.0 | 0.0 |
We are going to create a new column "mortality" in CovidPerCounty data frame. We will plot this value onto our final plot to compare with people per bed count value for each of counties.
CovidPerCounty['mortality']=CovidPerCounty['deaths']/CovidPerCounty['confirmed']
CovidPerCounty.head(5)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state_county | deaths | confirmed | population | confirmed_daily | deaths_daily | mortality | |
---|---|---|---|---|---|---|---|
0 | AK_Aleutians East | 0 | 40 | 3337.0 | 1.0 | 0.0 | 0.000000 |
1 | AK_Aleutians West | 0 | 242 | 5634.0 | 3.0 | 0.0 | 0.000000 |
2 | AK_Anchorage | 129 | 24530 | 288000.0 | 92.0 | 0.0 | 0.005259 |
3 | AK_Bethel | 12 | 2911 | 18386.0 | 26.0 | 0.0 | 0.004122 |
4 | AK_Bristol Bay | 0 | 0 | 836.0 | 0.0 | 0.0 | NaN |
Here we are going to merge two data frame and take out the values which can make interference with statistics.
Hospital_Covid_PerCounty=pd.merge(HospitalInfoPerCounty, CovidPerCounty, on='state_county')
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['population']>50000]
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['population']<300000]
Hospital_Covid_PerCounty=Hospital_Covid_PerCounty[Hospital_Covid_PerCounty['confirmed']>10]
Hospital_Covid_PerCounty.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state_county | hospital_bed_count | deaths | confirmed | population | confirmed_daily | deaths_daily | mortality | |
---|---|---|---|---|---|---|---|---|
0 | AL_Baldwin | 240 | 179 | 16002 | 223234.0 | 161.0 | 2.0 | 0.011186 |
1 | AL_Blount | 65 | 83 | 5264 | 57826.0 | 45.0 | 3.0 | 0.015767 |
2 | AL_Calhoun | 552 | 200 | 10982 | 113605.0 | 119.0 | 3.0 | 0.018212 |
8 | AL_Cullman | 115 | 104 | 7960 | 83768.0 | 38.0 | 10.0 | 0.013065 |
9 | AL_Elmore | 69 | 101 | 7723 | 81209.0 | 53.0 | 0.0 | 0.013078 |
In merged data, we are going to create a new column "people_per_bed" in order to compare with mortality.
Hospital_Covid_PerCounty['people_per_bed']=Hospital_Covid_PerCounty['population']/Hospital_Covid_PerCounty['hospital_bed_count']
Hospital_Covid_PerCounty.tail(5)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
state_county | hospital_bed_count | deaths | confirmed | population | confirmed_daily | deaths_daily | mortality | people_per_bed | |
---|---|---|---|---|---|---|---|---|---|
1652 | WV_Monongalia | 720 | 52 | 6311 | 105612.0 | 67.0 | 1.0 | 0.008240 | 146.683333 |
1657 | WV_Putnam | 70 | 63 | 3444 | 56450.0 | 37.0 | 2.0 | 0.018293 | 806.428571 |
1658 | WV_Raleigh | 173 | 37 | 3505 | 73361.0 | 79.0 | 1.0 | 0.010556 | 424.052023 |
1665 | WV_Wood | 343 | 76 | 5993 | 83518.0 | 54.0 | 6.0 | 0.012681 | 243.492711 |
1669 | WY_Natrona | 217 | 103 | 7039 | 79858.0 | 18.0 | 0.0 | 0.014633 | 368.009217 |
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="ticks")
plot=sns.relplot(data=Hospital_Covid_PerCounty,x="people_per_bed", y="mortality", hue="confirmed", size="population",
sizes=(40, 400)).set(ylim=(0,0.06),xlim=(0,6000))