02-analysis-deliveries.Rmd

---
title: "Doctor deliveries per hospital (AHRQ specs), 2016-2019"
date: "`r Sys.Date()`"
output:
  html_document:
    df_print: paged
knit: (function(inputFile, encoding) { rmarkdown::render(inputFile, encoding = encoding, output_dir = "docs") })
---

By **Christian McDonald**, Assistant Professor of Practice\
School of Journalism and Media, Moody College of Communication\
University of Texas at Austin

----

## Purpose of this analysis

The purpose of this notebook is to look at the number of deliveries per doctor and some subsets of that, answering the following questions:

- How many doctors per hospital are doing deliveries and how many deliveries do they do each year? Extrapolate to deliveries per week.
- Of all deliveries in a year, what percentage of women who went in to give birth got an episiotomy or C-section. Both statewide, and for hospitals of interest.

I DO NOT filter for complications, fetal deaths or other factors. This could be revisited later, but the idea is to see how many people are served for these procedures regardless of complications.

I use the `ATTENDING_PHYSICIAN_UNIF_ID` field to identify the attending physician, described as "Unique identifier assigned to the licensed physician expected to certify medical necessity of services rendered, with primary responsibility for the patient’s medical care and treatment." That indication does not guarantee the doctor performed a specific procedure but billing experts tell us this is a valid indication of a maternal patient's doctor.

> The ATTENDING_PHYSICIAN_UNIF_ID values are suppressed if there are fewer than 5 in a quarter, or if the license was temporary our could not be matched. That appears to be a significant suppression for many hospitals.

## Project setup

```{r setup, echo=T, results='hide', message=F, warning=F}
library(fs)
library(tidyverse)
library(dplyr)
library(janitor)
library(DT)
library(jsonlite)

# suppresses grouping warning
options(dplyr.summarise.inform = FALSE)
```

### Import all deliveries

I start with a subset of the [THCIC in-patient public use data files](https://www.dshs.texas.gov/thcic/hospitals/Inpatientpudf.shtm) that include all deliveries as outlined in AHRQ's [Inpatient Quality Indicators Technical Specifications](https://www.qualityindicators.ahrq.gov/Downloads/Modules/IQI/V2020/TechSpecs/IQI_33_Primary_Cesarean_Delivery_Rate_Uncomplicated.pdf). The `DELOCMD` specification in IQI 33 is "All deliveries, identified by any-listed ICD-10-CM diagnosis code for outcome of delivery". Some cleaning was applied as outlined in the `01-process-ahrq-del-loop` notebook. The result is imported here.

(I also have test data options, which is a small subset of the data for faster processing in testing.)

```{r imports}
test_flag <- F

### test data
path_test <- "data-test/ahrq_del_all_loop_test.rds"

### production data
path_prod <- "data-processed/ahrq_del_cleaned.rds"

### cleaned providers list
providers_full <- read_rds("data-processed/providers_full.rds")

### import based on flag
if (test_flag == T) del <- read_rds(path_test) else del <- read_rds(path_prod)

del %>% nrow()
```

### Set up various processing lists

These are lists from various AHRQ and Leapfrog definitions. See 01-process-lists for details.

```{r lists}
# ICD-10-PCS procedure codes for Cesarean delivery
prcsecp_list <- read_rds("procedures-lists/ahrq_prcsecp.rds") %>% .$prcsecp

# Episiotomy codes
epi_list <- read_rds("procedures-lists/lf_epi.rds") %>% .$epi
epi_list

# surgical procedure columns
surg_cols <- read_rds("procedures-lists/cols_surg.rds") %>% .$surg
```

### Cesarean indicator

Later we want to get rates for Cesarean deliveries. I create a marker column so we can do that.

[AHRQ Inpatient Quality Indicator](https://www.qualityindicators.ahrq.gov/Modules/IQI_TechSpec_ICD10_v2020.aspx) for [IQI 21 Cesarean Delivery Rate, Uncomplicated](https://www.qualityindicators.ahrq.gov/Downloads/Modules/IQI/V2020/TechSpecs/IQI_21_Cesarean_Delivery_Rate_Uncomplicated.pdf) defines a Cesarean delivery as such.

Number of Cesarean deliveries among cases meeting the inclusion and exclusion rules for the
denominator. Cesarean deliveries are identified by any-listed ICD-10-PCS procedure codes for Cesarean delivery (PRCSECP) and without any-listed ICD-10-PCS procedure codes for hysterotomy
(PRCSE2P).

I don't apply the hysterotomy filter since we want all Cesareans.


```{r csec_col}
del_csec <- del %>% 
  mutate(
    PRCSECP = case_when(
      PRINC_SURG_PROC_CODE %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_1 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_2 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_3 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_4 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_5 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_6 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_7 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_8 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_9 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_10 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_11 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_12 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_13 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_14 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_15 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_16 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_17 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_18 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_19 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_20 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_21 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_22 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_23 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_24 %in% prcsecp_list ~ TRUE,
      TRUE                     ~  FALSE
    )
  )

# peek at result
del_csec %>% 
  count(PRCSECP) %>% 
  rename(CASES = n)
```

### Episiotomy indicator

I use the [2019 Leapfrog Hospital Survey p103](https://www.leapfroggroup.org/sites/default/files/Files/2019HospitalSurvey_20190529_v8.0%20%28version%203%29.pdf) for the episiotomy definition. Of note is we are applying this to all deliveries as defined by AHRQ instead of Leapfrog's original definition. The definition difference should be minimal.

```{r epi_col}
# list of codes for episiotomy, which is really one: 0W8NXZZ
del_csec_epi <- del_csec %>% 
  mutate(
    EPI = case_when(
      PRINC_SURG_PROC_CODE %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_1 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_2 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_3 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_4 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_5 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_6 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_7 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_8 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_9 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_10 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_11 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_12 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_13 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_14 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_15 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_16 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_17 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_18 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_19 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_20 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_21 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_22 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_23 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_24 %in% epi_list ~ TRUE,
      TRUE ~ FALSE
    )
  )

del_csec_epi %>% 
  count(EPI) %>% 
  rename(RECORD = n)
```

### Doctor suppression indicator

The ATTENDING_PHYSICIAN_UNIF_ID is "Suppressed when the number of physicians represented in a DRG for a hospital is less than the minimum cell size of five."

- 9999999998: Cell size less than 5
- 9999999999: Temporary license or license number could not be matched

I create a variable if doctor is suppressed ...

```{r supp_add}
suppressed = c("9999999998", "9999999999")

# add value of suppressed doctor to data
del_csec_epi_suppflag <- del_csec_epi %>% 
  mutate(
    PSUPP = if_else(ATTENDING_PHYSICIAN_UNIF_ID %in% suppressed, T, F)
  )

```

> At this point our data is processed.

---

## Understanding the scope of doctor suppressions

Use the created `PSPP` field to count overall suppressions to get idea of scope.

```{r supp_look}
del_csec_epi_suppflag %>% 
  tabyl(PSUPP) %>% 
  rename(count = n) %>% 
  adorn_pct_formatting()
```

The ATTENDING_PHYSICIAN is suppressed in 6.5% of deliveries. For some hospitals, like Brownwood Regional Medical Center, all the physician data is suppressed.

```{r supp_count}
del_csec_epi_suppflag %>% 
  filter(str_detect(PROVIDER_NAME, "Brownwood Regional Medical Center")) %>% 
  tabyl(PSUPP) %>% 
  rename(count = n) %>% 
  adorn_pct_formatting()
```

### Laredo suppressions

Since our hospitals of interest are in Laredo, let's check how many of those are suppressed as well.

```{r supp_count_laredo}
del_csec_epi_suppflag %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo")) %>% 
  tabyl(PSUPP, PROVIDER_NAME) %>% 
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 2) %>%
  adorn_ns()
```

### Doctor suppression rates per hospital in 2019

Later we want to know which hospitals have high doctor suppression rates, which would indicate that our deliveries per doctor calculations could be suspect. If the value for `SUSPECT` is `TRUE`, then the suppression rate is higher than 10% or otherwise suspect.

```{r supp_rate_hosp}
supp_rate_hosp <- del_csec_epi_suppflag %>% 
  group_by(THCIC_ID, PROVIDER_NAME, PSUPP) %>% 
  summarize(
    CNT = n()
  ) %>% 
  pivot_wider(names_from = PSUPP, values_from = CNT) %>% 
  rename(
    NSUPP = `FALSE`,
    YSUPP = `TRUE`
  ) %>% 
  mutate(
    S_RT = (YSUPP / (NSUPP + YSUPP)) %>% round_half_up(2),
    SUSPECT = case_when(S_RT < .1 ~ F,
      TRUE ~ T)
  )

supp_rate_hosp %>% head(10)
```

### Removing suppressed doctors

To achieve a count of deliveries per doctor, I remove the records where the ATTENDING_PHYSICIAN was suppressed as this could combine multiple doctors into the same group. This suppression may have more affect on some hospitals than others.

```{r del_supp}
del_supp <- del_csec_epi_suppflag %>% 
  filter(
    !ATTENDING_PHYSICIAN_UNIF_ID %in% suppressed
  )
```


## 2019 deliveries per doctor, per week

How many deliveries per doctor were there in 2019? How does that extrapolate to average deliveries per week per doctor? Starts with suppressed doctors removed.

> While we have all four quarters of 2019, there could be some records missing as hospitals have 60 days past end of quarter to file those records.


```{r del_doc_hosp_2019}
# Weeks in a year
week_var = 52.143

del_doc_hosp_2019 <- del_supp %>% 
  filter(YR == 2019) %>% 
  group_by(THCIC_ID,PROVIDER_NAME) %>% 
  summarize(
    PHYSICIANS = n_distinct(ATTENDING_PHYSICIAN_UNIF_ID),
    DELIVERIES = n()
  ) %>% 
  mutate(
    DELSPERDOC = (DELIVERIES / PHYSICIANS) %>% round_half_up(1),
    DELSPERWK = (DELSPERDOC / week_var) %>% round_half_up(1)
  ) %>%
  arrange(DELSPERWK %>% desc())

# get suppression suspect value
srh <- supp_rate_hosp %>% 
  select(THCIC_ID, PROVIDER_NAME, SUSPECT)

del_doc_hosp_2019 <- del_doc_hosp_2019 %>% 
  left_join(srh, by = c("THCIC_ID", "PROVIDER_NAME")) %>% 
  # filter out hospitals with suspect numbers due to suppression
  filter(SUSPECT == F) %>%
  select(-SUSPECT)

del_doc_hosp_2019 %>% datatable()

```

## Total deliveries per hospital per year

Total deliveries after filtering years with fewer than 30 deliveries. This does NOT filter out suppressed doctors since we are looking for a total delivery count.

```{r deliveries_yr}

# group and count
deliveries_yr <- del_csec_epi_suppflag %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME) %>% 
  summarize(BIRTHS = n()) %>% 
  filter(BIRTHS > 30)

# pivot for table
deliveries_yr %>%
  pivot_wider(names_from = YR, values_from = BIRTHS) %>%
  datatable()
```

### Count of hospitals in study

We get a distinct count of the hospitals studied to include in the story.

```{r hosp_cnt}
deliveries_yr %>% 
  ungroup() %>% 
  select(-YR) %>% 
  distinct(THCIC_ID, PROVIDER_NAME) %>% 
  nrow()
```

### Count of hospitals in study, 2019

We do the same for just 2019.

```{r hosp_cnt_2019}
deliveries_yr %>% 
  ungroup() %>% 
  filter(YR == "2019") %>% 
  distinct(THCIC_ID, PROVIDER_NAME) %>% 
  nrow()
```

## Cesarean rates by hospital

Here we look at all deliveries to see how many mothers have Cesareans. Note this starts with all ATTENDING_PHYSICIANS, including those who's IDs are suppressed. These are also all Cesareans, complicated or not. This is a simple rate of Cesarean births / All births in order to get an idea how many are performed.

Excludes hospitals with fewer than 120 deliveries in our time span.

```{r csec_rate_hosp}
csec_rate_hosp <- del_csec_epi_suppflag %>% 
  group_by(THCIC_ID, PROVIDER_NAME, PRCSECP) %>% 
  count(PRCSECP) %>% 
  rename(CASES = n) %>%
  pivot_wider(names_from = PRCSECP, values_from = CASES) %>% 
  rename(NCSEC = "FALSE", CSEC = "TRUE") %>% 
  mutate(
    TOTAL = NCSEC + CSEC,
    RATE = round_half_up((CSEC / TOTAL) * 100, 1)
  ) %>% 
  filter(
    TOTAL >= 120 
  ) %>% 
  arrange(RATE %>% desc())

csec_rate_hosp %>% 
  datatable()
```

### Cesearean rate for Laredo

```{r csec_rate_hosp_laredo}
csec_rate_hosp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))
```

### Cesarean rates by hospital by year

This is the same calculation as above, but by year, showing only the rates. Excludes hospitals with fewer than 30 deliveries in a given year.

```{r csec_hosp_rate_yr}
csec_hosp_rate_yr <- del_csec_epi_suppflag %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME, PRCSECP) %>% 
  count(YR, PRCSECP) %>%
  rename(CASES = n) %>%
  pivot_wider(names_from = PRCSECP, values_from = CASES) %>%
  rename(NCSEC = "FALSE", CSEC = "TRUE") %>%
  mutate(
    TOTAL = NCSEC + CSEC,
    CRATE = round_half_up((CSEC / TOTAL) * 100, 1)
  ) %>%
  filter(
    TOTAL >= 30
  )
```

Present the rate as table.

```{r csec_hosp_rate_yr_table}
csec_hosp_rate_yr_table <- csec_hosp_rate_yr %>%
  select(THCIC_ID, YR, PROVIDER_NAME, CRATE) %>%
  pivot_wider(names_from = YR, values_from = CRATE) %>%
  arrange(`2019` %>% desc())

csec_hosp_rate_yr_table %>% 
  datatable()
```


### Cesarean rates by year for Laredo

```{r csec_hosp_rate_yr_laredo}
csec_hosp_rate_yr_table %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))
```


## Episiotomy rates

Like the simple Cesarean rate above, here we look at all deliveries and note how many involved episiotomies. Note this starts with all ATTENDING_PHYSICIANS, including those who's IDs are suppressed. Excludes hospitals with fewer than 120 deliveries in our time span.

```{r epi_rate_hosp}
epi_rate_hosp <- del_csec_epi_suppflag %>% 
  group_by(THCIC_ID, PROVIDER_NAME, EPI) %>% 
  count(EPI) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPI, values_from = CASES) %>% 
  rename(
    EPI = "TRUE",
    NEPI = "FALSE"
  ) %>% 
  mutate(
    TOTAL = NEPI + EPI,
    ERATE = round_half_up((EPI / TOTAL * 100), 1)
  ) %>% 
  filter(TOTAL >= 120) %>% 
  arrange(ERATE %>% desc())

epi_rate_hosp %>% 
  datatable()
```

### Episiotomy rate for Laredo hospitals

```{r epi_rate_hosp_laredo}
epi_rate_hosp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))
```

## Episiotomy rates by hospital by year

Values exluded where there were fewer than 30 deliveries in a given year.

```{r epi_rate_hosp_yr}
epi_rate_hosp_yr <- del_csec_epi_suppflag %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME, EPI) %>% 
  count(YR, EPI) %>%
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPI, values_from = CASES) %>% 
  rename(NEPI = "FALSE", EPI = "TRUE") %>% 
  mutate(
    TOTAL = NEPI + EPI,
    ERATE = round_half_up((EPI / TOTAL) * 100, 1)
  ) %>% 
  filter(
    TOTAL >= 30 
  ) %>% 
  select(YR, THCIC_ID, PROVIDER_NAME, ERATE)
  
# pivot for table
epi_rate_hosp_yr_table <- epi_rate_hosp_yr %>% 
  pivot_wider(names_from = YR, values_from = ERATE) %>% 
  arrange(`2019` %>% desc())

#peek at table
epi_rate_hosp_yr_table %>% 
  datatable()
```

### Episiotomy rate by year for Laredo

```{r epi_rate_hosp_yr_laredo}
epi_rate_hosp_yr %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))
```

## Either Episiotomy or Cesarean

This looks to see if a mother had either a cesarean or an episiotomy. All the same things apply: All doctors, all complications, but hospitals with fewer than 120 deliveries in our time period are excluded.

```{r epicsec_rate_hosp}
epicsec_rate_hosp <- del_csec_epi_suppflag %>% 
  mutate(
        EPICSEC = if_else((EPI == T | PRCSECP) == T, T, F)
  ) %>% 
  count(THCIC_ID, PROVIDER_NAME, EPICSEC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPICSEC, values_from = CASES) %>% 
  rename(
    EPICSEC = "TRUE",
    NEPICSEC = "FALSE"
  ) %>% 
  mutate(
    TOTAL = EPICSEC + NEPICSEC,
    ECRATE = round_half_up((EPICSEC / TOTAL * 100), 1)
  ) %>% 
  filter(TOTAL >= 120) %>% 
  arrange(ECRATE %>% desc())

epicsec_rate_hosp %>% 
  datatable()
```

### Either episiotomy or Cesarean for Laredo

```{r epicsec_rate_hosp_laredo}
epicsec_rate_hosp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))
```

### Episiotomy or Cesarean rate by hospital by year

This uses all deliveries. Hospital excluded if fewer than 30 deliveries a year.

```{r epicsec_rate_hosp_yr}
# prep data for table
epicsec_rate_hosp_yr <- del_csec_epi_suppflag %>% 
  mutate(
    EPICSEC = if_else((EPI == T | PRCSECP) == T, T, F)
  ) %>% 
  count(THCIC_ID, YR, PROVIDER_NAME, EPICSEC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPICSEC, values_from = CASES) %>% 
  rename(
    EPICSEC = "TRUE",
    NEPICSEC = "FALSE"
  ) %>% 
  mutate(
    TOTAL = EPICSEC + NEPICSEC,
    ECRATE = round_half_up((EPICSEC / TOTAL * 100), 1)
  ) %>% 
  filter(TOTAL >= 30)

# pivot data for table
epicsec_rate_hosp_yr_table <- epicsec_rate_hosp_yr %>% 
  select(THCIC_ID, YR, PROVIDER_NAME, ECRATE) %>% 
  pivot_wider(names_from = YR, values_from = ECRATE) %>% 
  arrange(`2019` %>% desc())

epicsec_rate_hosp_yr_table %>% 
  datatable()
```

### Episiotomy or Cesarean rate by year for Laredo

```{r epicsec_rate_hosp_yr_laredo}
epicsec_rate_hosp_yr_table %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))
```

## Get Medicaid rate 2019

One value we don't have yet is the percentage of patients using Medicaid.

```{r mediciad_2019}
mediciad_2019 <- del_csec_epi_suppflag %>% 
  # filter for 2019
  filter(
    YR == 2019
  ) %>% 
  # create MC col
  mutate(
    MC = if_else(FIRST_PAYMENT_SRC == "MC", T, F),
    MC = if_else(is.na(MC), F, MC)
  ) %>% 
  # group and count MC
  group_by(THCIC_ID, PROVIDER_NAME, PROVIDER_CITY) %>% 
  count(MC) %>% 
  # pivot to shape to get rate
  pivot_wider(names_from = MC, values_from = n) %>% 
  # get rate
  mutate(
    MCRATE = ((`TRUE` / (`TRUE` + `FALSE`)) * 100) %>% round_half_up(1)
  )

# peek as table
mediciad_2019 %>% datatable()
```

## 2019 by hospital summary

Building a single dataframe that has summary data by hospital. This is for the interactive. Before we can join them all, we need to pare some calcs down to 2019 only.

### Get epi rate for 2019

Extrating 2019 for joining later.

```{r epi_rate_hosp_2019}
epi_rate_hosp_2019 <- epi_rate_hosp_yr %>% 
  filter(
    YR == 2019
  ) %>% 
  ungroup() %>% 
  select(THCIC_ID, ERATE)
```

### Get cesarean rate for 2019

```{r csec_hosp_rate_2019}
csec_hosp_rate_2019 <- csec_hosp_rate_yr %>% 
  filter(
    YR == 2019
  ) %>% 
  ungroup() %>% 
  select(THCIC_ID, CRATE)
```


## Quality checks

### Difference when removing infrequent deliveries

In this view I remove doctors with fewer than 10 deliveries in a given year to give a more accurate look at the staff doctors that do the bulk of deliveries. Suppressed doctor records are removed.

> The choice of 10 deliveries as a threshold is arbitrary and could be changed if we deem necessary.

```{r del_hosp_doc_freq}
del_hosp_doc_freq <- del_supp %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME, ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  summarize(DELIVERIES = n()) %>% 
  rename(PHYSICIAN = ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  filter(DELIVERIES >= 10) %>% 
  group_by(THCIC_ID, PROVIDER_NAME) %>% 
  summarize(
    DELIVERIES = sum(DELIVERIES),
    PHYSICIANS = n_distinct(PHYSICIAN)
  ) %>% 
  mutate(
    DRATE = round_half_up(DELIVERIES/PHYSICIANS,1)
  ) %>% 
  arrange(DRATE %>% desc())

del_hosp_doc_freq %>% 
  datatable()
```

### Deliveries per doctor at Laredo hospitals

Delivery rate per doctor by year. Unknown doctors are suppressed. Infrequent deliveries (<20/year) are filtered out.

```{r del_hosp_doc_freq_yr_laredo}
del_supp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo")) %>%
  group_by(YR, THCIC_ID, PROVIDER_NAME, ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  summarize(DELIVERIES = n()) %>% 
  rename(PHYSICIAN = ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  filter(DELIVERIES >= 10) %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME) %>% 
  summarize(
    DELIVERIES = sum(DELIVERIES),
    PHYSICIANS = n_distinct(PHYSICIAN)
  ) %>% 
  mutate(
    RATE = round_half_up(DELIVERIES/PHYSICIANS, 1)
  ) %>% 
  select(YR, THCIC_ID, PROVIDER_NAME, RATE) %>% 
  pivot_wider(names_from = YR, values_from = RATE)
```

Just how much does doctor suppression affect these rates in Laredo? We'll do the same calculation on the data but without suppressing unknown doctors. Note this has limitations because multiple doctors could be counted in the two suppression values.

```{r del_hosp_doc_freq_yr_laredo_nosupp}
del %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo")) %>%
  group_by(YR, THCIC_ID, PROVIDER_NAME, ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  summarize(DELIVERIES = n()) %>% 
  rename(PHYSICIAN = ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  filter(DELIVERIES >= 10) %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME) %>% 
  summarize(
    DELIVERIES = sum(DELIVERIES),
    PHYSICIANS = n_distinct(PHYSICIAN)
  ) %>% 
  mutate(
    RATE = round_half_up(DELIVERIES/PHYSICIANS, 1)
  ) %>% 
  select(YR, THCIC_ID, PROVIDER_NAME, RATE) %>% 
  pivot_wider(names_from = YR, values_from = RATE)
```


## Summaries

### Medicaid summary

Get the rate of medicaid deliveries statewide.

```{r medi_rate_tx_yr_summary}
medi_rate_tx_yr_summary <- del %>% 
  mutate(
    MC = if_else(FIRST_PAYMENT_SRC == "MC", T, F),
    MC = if_else(is.na(MC), F, MC)
  ) %>% 
  count(YR, MC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = MC, values_from = CASES) %>% 
  rename(MCF = `FALSE`, MCT = `TRUE`) %>% 
  mutate(
    SUMMARY = "TX",
    CATEGORY = "MEDICARE",
    MEASUREMENT = "RATE",
    TOTAL = MCF + MCT,
    VALUE = round_half_up((MCT / TOTAL) * 100, 1) # RATE
  ) %>% 
  select(YR, SUMMARY, CATEGORY, MEASUREMENT, VALUE)

medi_rate_tx_yr_summary
```


### Cesarean summary

Get the rate of cesearean deliveries across the state.

```{r csec_rate_tx_yr_summary}
csec_rate_tx_yr_summary <- del_csec %>% 
  group_by(YR, PRCSECP) %>% 
  count(PRCSECP) %>% 
  rename(CASES = n) %>%
  pivot_wider(names_from = PRCSECP, values_from = CASES) %>% 
  rename(NCSEC = "FALSE", CSEC = "TRUE") %>% 
  mutate(
    SUMMARY = "TX",
    CATEGORY = "CESAREAN",
    MEASUREMENT = "RATE",
    TOTAL = NCSEC + CSEC,
    VALUE = round_half_up((CSEC / TOTAL) * 100, 1) # RATE
  ) %>% 
  select(YR, SUMMARY, CATEGORY, MEASUREMENT, VALUE)

csec_rate_tx_yr_summary
```

### Episiotomy & Cesarean summary

Get the percentage of deliveries where either an episiotomy or cesarean were performed.

```{r epicsec_rate_tx_yr_summary}
epicsec_rate_tx_yr_summary <- del_csec_epi %>% 
  mutate(
        EPICSEC = if_else((EPI == T | PRCSECP) == T, T, F)
  ) %>% 
  group_by(YR, EPICSEC) %>% 
  count(EPICSEC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPICSEC, values_from = CASES) %>% 
  rename(
    EPICSEC = "TRUE",
    NEPICSEC = "FALSE"
  ) %>% 
  mutate(
    SUMMARY = "TX",
    CATEGORY = "EPI_OR_CSEC",
    MEASUREMENT = "RATE",
    TOTAL = EPICSEC + NEPICSEC,
    VALUE = round_half_up((EPICSEC / TOTAL * 100), 1) # RATE
  ) %>% 
  select(YR, SUMMARY, CATEGORY, MEASUREMENT, VALUE)

epicsec_rate_tx_yr_summary
```

### Episiotomy or Cesarean rate by hospital: Averaged, by year

```{r epicsec_rate_hosp_yr_summary}
epicsec_rate_hosp_yr_summary <- epicsec_rate_hosp_yr %>% 
  ungroup() %>% 
  group_by(YR) %>% 
  summarize(
    SUMMARY = "HOSPITAL",
    CATEGORY = "EPI_OR_CSEC",
    MEASUREMENT = "MEAN_OF_RATE",
    # MEDIAN = median(RATE),
    VALUE = round_half_up(mean(ECRATE, na.rm = TRUE),1) # MEAN OF RATE
  )

epicsec_rate_hosp_yr_summary
```

## Data for Interactive: Hospital blurbs

Here we start with cleaned hospitals list and combine to get the medicare rate, deliveries per week and maternal levels. Some values were calculated from data where doctors were suppressed, and so may be NA.

The maternal levels were manually collected into a spreadsheet (and updated Nov. 24, 2020) and then used here.

```{r blurbs_2019}
# Import maternal levels

maternal_url <- "https://docs.google.com/spreadsheets/d/1efUxctHiY4cCgrqqHvBVx7jZobbx5Dgmpbf_YCd7tcE/gviz/tq?tqx=out:csv"
maternal_levels <- read_csv(maternal_url) %>% 
  mutate(
    THCIC_ID = as.character(THCIC_ID)
  )

blurbs_2019 <- providers_full %>% 
  rename(PROVIDER_NAME = PROVIDER_NAME_CLEANED) %>% 
  select(-PROVIDER_CITY) %>% 
  left_join(
    mediciad_2019 %>%
      ungroup() %>% 
      select(THCIC_ID, MCRATE),
    by = "THCIC_ID") %>% 
  # get deliveries per wk
  left_join(
    del_doc_hosp_2019 %>%
      ungroup() %>%
      select(THCIC_ID, DELSPERWK),
      by = "THCIC_ID"
    ) %>% 
  left_join(maternal_levels %>% select(-PROVIDER_NAME))

blurbs_2019 %>% head(10)
```

## Writing files and closing out

For export:

- blurbs_2019.json: Data: Hospital, birth per doc rate (weekly), Medicaid percentage, Maternal level.

Plus summary files for another notebook.

```{r write}

if (test_flag == F) medi_rate_tx_yr_summary %>%
  write_rds("data-processed/medi_rate_tx_yr_summary.rds")

if (test_flag == F) csec_rate_tx_yr_summary %>%
  write_rds("data-processed/csec_rate_tx_yr_summary.rds")

if (test_flag == F) epicsec_rate_tx_yr_summary %>%
  write_rds("data-processed/epicsec_rate_tx_yr_summary.rds")

if (test_flag == F) epicsec_rate_hosp_yr_summary %>%
  write_rds("data-processed/epicsec_rate_hosp_yr_summary.rds")

if (test_flag == F) blurbs_2019 %>%
  arrange(PROVIDER_NAME) %>% 
  write_json("exports/blurbs_2019.json")

# A klaxon to indicate the processing is complete
beepr::beep(4)
```