By Christian McDonald, Assistant Professor of Practice
School of Journalism and Media, Moody College of Communication
University of Texas at Austin


Purpose of this analysis

The purpose of this notebook is to look at the number of deliveries per doctor and some subsets of that, answering the following questions:

I DO NOT filter for complications, fetal deaths or other factors. This could be revisited later, but the idea is to see how many people are served for these procedures regardless of complications.

I use the ATTENDING_PHYSICIAN_UNIF_ID field to identify the attending physician, described as “Unique identifier assigned to the licensed physician expected to certify medical necessity of services rendered, with primary responsibility for the patient’s medical care and treatment.” That indication does not guarantee the doctor performed a specific procedure but billing experts tell us this is a valid indication of a maternal patient’s doctor.

The ATTENDING_PHYSICIAN_UNIF_ID values are suppressed if there are fewer than 5 in a quarter, or if the license was temporary our could not be matched. That appears to be a significant suppression for many hospitals.

Project setup

library(fs)
library(tidyverse)
library(dplyr)
library(janitor)
library(DT)
library(jsonlite)

# suppresses grouping warning
options(dplyr.summarise.inform = FALSE)

Import all deliveries

I start with a subset of the THCIC in-patient public use data files that include all deliveries as outlined in AHRQ’s Inpatient Quality Indicators Technical Specifications. The DELOCMD specification in IQI 33 is “All deliveries, identified by any-listed ICD-10-CM diagnosis code for outcome of delivery”. Some cleaning was applied as outlined in the 01-process-ahrq-del-loop notebook. The result is imported here.

(I also have test data options, which is a small subset of the data for faster processing in testing.)

test_flag <- F

### test data
path_test <- "data-test/ahrq_del_all_loop_test.rds"

### production data
path_prod <- "data-processed/ahrq_del_cleaned.rds"

### cleaned providers list
providers_full <- read_rds("data-processed/providers_full.rds")

### import based on flag
if (test_flag == T) del <- read_rds(path_test) else del <- read_rds(path_prod)

del %>% nrow()
## [1] 1457413

Set up various processing lists

These are lists from various AHRQ and Leapfrog definitions. See 01-process-lists for details.

# ICD-10-PCS procedure codes for Cesarean delivery
prcsecp_list <- read_rds("procedures-lists/ahrq_prcsecp.rds") %>% .$prcsecp

# Episiotomy codes
epi_list <- read_rds("procedures-lists/lf_epi.rds") %>% .$epi
epi_list
## [1] "0W8NXZZ"
# surgical procedure columns
surg_cols <- read_rds("procedures-lists/cols_surg.rds") %>% .$surg

Cesarean indicator

Later we want to get rates for Cesarean deliveries. I create a marker column so we can do that.

AHRQ Inpatient Quality Indicator for IQI 21 Cesarean Delivery Rate, Uncomplicated defines a Cesarean delivery as such.

Number of Cesarean deliveries among cases meeting the inclusion and exclusion rules for the denominator. Cesarean deliveries are identified by any-listed ICD-10-PCS procedure codes for Cesarean delivery (PRCSECP) and without any-listed ICD-10-PCS procedure codes for hysterotomy (PRCSE2P).

I don’t apply the hysterotomy filter since we want all Cesareans.

del_csec <- del %>% 
  mutate(
    PRCSECP = case_when(
      PRINC_SURG_PROC_CODE %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_1 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_2 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_3 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_4 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_5 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_6 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_7 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_8 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_9 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_10 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_11 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_12 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_13 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_14 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_15 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_16 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_17 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_18 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_19 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_20 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_21 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_22 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_23 %in% prcsecp_list ~ TRUE,
      OTH_SURG_PROC_CODE_24 %in% prcsecp_list ~ TRUE,
      TRUE                     ~  FALSE
    )
  )

# peek at result
del_csec %>% 
  count(PRCSECP) %>% 
  rename(CASES = n)

Episiotomy indicator

I use the 2019 Leapfrog Hospital Survey p103 for the episiotomy definition. Of note is we are applying this to all deliveries as defined by AHRQ instead of Leapfrog’s original definition. The definition difference should be minimal.

# list of codes for episiotomy, which is really one: 0W8NXZZ
del_csec_epi <- del_csec %>% 
  mutate(
    EPI = case_when(
      PRINC_SURG_PROC_CODE %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_1 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_2 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_3 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_4 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_5 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_6 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_7 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_8 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_9 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_10 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_11 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_12 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_13 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_14 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_15 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_16 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_17 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_18 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_19 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_20 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_21 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_22 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_23 %in% epi_list ~ TRUE,
      OTH_SURG_PROC_CODE_24 %in% epi_list ~ TRUE,
      TRUE ~ FALSE
    )
  )

del_csec_epi %>% 
  count(EPI) %>% 
  rename(RECORD = n)

Doctor suppression indicator

The ATTENDING_PHYSICIAN_UNIF_ID is “Suppressed when the number of physicians represented in a DRG for a hospital is less than the minimum cell size of five.”

  • 9999999998: Cell size less than 5
  • 9999999999: Temporary license or license number could not be matched

I create a variable if doctor is suppressed …

suppressed = c("9999999998", "9999999999")

# add value of suppressed doctor to data
del_csec_epi_suppflag <- del_csec_epi %>% 
  mutate(
    PSUPP = if_else(ATTENDING_PHYSICIAN_UNIF_ID %in% suppressed, T, F)
  )

At this point our data is processed.


Understanding the scope of doctor suppressions

Use the created PSPP field to count overall suppressions to get idea of scope.

del_csec_epi_suppflag %>% 
  tabyl(PSUPP) %>% 
  rename(count = n) %>% 
  adorn_pct_formatting()

The ATTENDING_PHYSICIAN is suppressed in 6.5% of deliveries. For some hospitals, like Brownwood Regional Medical Center, all the physician data is suppressed.

del_csec_epi_suppflag %>% 
  filter(str_detect(PROVIDER_NAME, "Brownwood Regional Medical Center")) %>% 
  tabyl(PSUPP) %>% 
  rename(count = n) %>% 
  adorn_pct_formatting()

Laredo suppressions

Since our hospitals of interest are in Laredo, let’s check how many of those are suppressed as well.

del_csec_epi_suppflag %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo")) %>% 
  tabyl(PSUPP, PROVIDER_NAME) %>% 
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 2) %>%
  adorn_ns()

Doctor suppression rates per hospital in 2019

Later we want to know which hospitals have high doctor suppression rates, which would indicate that our deliveries per doctor calculations could be suspect. If the value for SUSPECT is TRUE, then the suppression rate is higher than 10% or otherwise suspect.

supp_rate_hosp <- del_csec_epi_suppflag %>% 
  group_by(THCIC_ID, PROVIDER_NAME, PSUPP) %>% 
  summarize(
    CNT = n()
  ) %>% 
  pivot_wider(names_from = PSUPP, values_from = CNT) %>% 
  rename(
    NSUPP = `FALSE`,
    YSUPP = `TRUE`
  ) %>% 
  mutate(
    S_RT = (YSUPP / (NSUPP + YSUPP)) %>% round_half_up(2),
    SUSPECT = case_when(S_RT < .1 ~ F,
      TRUE ~ T)
  )

supp_rate_hosp %>% head(10)

Removing suppressed doctors

To achieve a count of deliveries per doctor, I remove the records where the ATTENDING_PHYSICIAN was suppressed as this could combine multiple doctors into the same group. This suppression may have more affect on some hospitals than others.

del_supp <- del_csec_epi_suppflag %>% 
  filter(
    !ATTENDING_PHYSICIAN_UNIF_ID %in% suppressed
  )

2019 deliveries per doctor, per week

How many deliveries per doctor were there in 2019? How does that extrapolate to average deliveries per week per doctor? Starts with suppressed doctors removed.

While we have all four quarters of 2019, there could be some records missing as hospitals have 60 days past end of quarter to file those records.

# Weeks in a year
week_var = 52.143

del_doc_hosp_2019 <- del_supp %>% 
  filter(YR == 2019) %>% 
  group_by(THCIC_ID,PROVIDER_NAME) %>% 
  summarize(
    PHYSICIANS = n_distinct(ATTENDING_PHYSICIAN_UNIF_ID),
    DELIVERIES = n()
  ) %>% 
  mutate(
    DELSPERDOC = (DELIVERIES / PHYSICIANS) %>% round_half_up(1),
    DELSPERWK = (DELSPERDOC / week_var) %>% round_half_up(1)
  ) %>%
  arrange(DELSPERWK %>% desc())

# get suppression suspect value
srh <- supp_rate_hosp %>% 
  select(THCIC_ID, PROVIDER_NAME, SUSPECT)

del_doc_hosp_2019 <- del_doc_hosp_2019 %>% 
  left_join(srh, by = c("THCIC_ID", "PROVIDER_NAME")) %>% 
  # filter out hospitals with suspect numbers due to suppression
  filter(SUSPECT == F) %>%
  select(-SUSPECT)

del_doc_hosp_2019 %>% datatable()

Total deliveries per hospital per year

Total deliveries after filtering years with fewer than 30 deliveries. This does NOT filter out suppressed doctors since we are looking for a total delivery count.

# group and count
deliveries_yr <- del_csec_epi_suppflag %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME) %>% 
  summarize(BIRTHS = n()) %>% 
  filter(BIRTHS > 30)

# pivot for table
deliveries_yr %>%
  pivot_wider(names_from = YR, values_from = BIRTHS) %>%
  datatable()

Count of hospitals in study

We get a distinct count of the hospitals studied to include in the story.

deliveries_yr %>% 
  ungroup() %>% 
  select(-YR) %>% 
  distinct(THCIC_ID, PROVIDER_NAME) %>% 
  nrow()
## [1] 263

Count of hospitals in study, 2019

We do the same for just 2019.

deliveries_yr %>% 
  ungroup() %>% 
  filter(YR == "2019") %>% 
  distinct(THCIC_ID, PROVIDER_NAME) %>% 
  nrow()
## [1] 223

Cesarean rates by hospital

Here we look at all deliveries to see how many mothers have Cesareans. Note this starts with all ATTENDING_PHYSICIANS, including those who’s IDs are suppressed. These are also all Cesareans, complicated or not. This is a simple rate of Cesarean births / All births in order to get an idea how many are performed.

Excludes hospitals with fewer than 120 deliveries in our time span.

csec_rate_hosp <- del_csec_epi_suppflag %>% 
  group_by(THCIC_ID, PROVIDER_NAME, PRCSECP) %>% 
  count(PRCSECP) %>% 
  rename(CASES = n) %>%
  pivot_wider(names_from = PRCSECP, values_from = CASES) %>% 
  rename(NCSEC = "FALSE", CSEC = "TRUE") %>% 
  mutate(
    TOTAL = NCSEC + CSEC,
    RATE = round_half_up((CSEC / TOTAL) * 100, 1)
  ) %>% 
  filter(
    TOTAL >= 120 
  ) %>% 
  arrange(RATE %>% desc())

csec_rate_hosp %>% 
  datatable()

Cesearean rate for Laredo

csec_rate_hosp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))

Cesarean rates by hospital by year

This is the same calculation as above, but by year, showing only the rates. Excludes hospitals with fewer than 30 deliveries in a given year.

csec_hosp_rate_yr <- del_csec_epi_suppflag %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME, PRCSECP) %>% 
  count(YR, PRCSECP) %>%
  rename(CASES = n) %>%
  pivot_wider(names_from = PRCSECP, values_from = CASES) %>%
  rename(NCSEC = "FALSE", CSEC = "TRUE") %>%
  mutate(
    TOTAL = NCSEC + CSEC,
    CRATE = round_half_up((CSEC / TOTAL) * 100, 1)
  ) %>%
  filter(
    TOTAL >= 30
  )

Present the rate as table.

csec_hosp_rate_yr_table <- csec_hosp_rate_yr %>%
  select(THCIC_ID, YR, PROVIDER_NAME, CRATE) %>%
  pivot_wider(names_from = YR, values_from = CRATE) %>%
  arrange(`2019` %>% desc())

csec_hosp_rate_yr_table %>% 
  datatable()

Cesarean rates by year for Laredo

csec_hosp_rate_yr_table %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))

Episiotomy rates

Like the simple Cesarean rate above, here we look at all deliveries and note how many involved episiotomies. Note this starts with all ATTENDING_PHYSICIANS, including those who’s IDs are suppressed. Excludes hospitals with fewer than 120 deliveries in our time span.

epi_rate_hosp <- del_csec_epi_suppflag %>% 
  group_by(THCIC_ID, PROVIDER_NAME, EPI) %>% 
  count(EPI) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPI, values_from = CASES) %>% 
  rename(
    EPI = "TRUE",
    NEPI = "FALSE"
  ) %>% 
  mutate(
    TOTAL = NEPI + EPI,
    ERATE = round_half_up((EPI / TOTAL * 100), 1)
  ) %>% 
  filter(TOTAL >= 120) %>% 
  arrange(ERATE %>% desc())

epi_rate_hosp %>% 
  datatable()

Episiotomy rate for Laredo hospitals

epi_rate_hosp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))

Episiotomy rates by hospital by year

Values exluded where there were fewer than 30 deliveries in a given year.

epi_rate_hosp_yr <- del_csec_epi_suppflag %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME, EPI) %>% 
  count(YR, EPI) %>%
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPI, values_from = CASES) %>% 
  rename(NEPI = "FALSE", EPI = "TRUE") %>% 
  mutate(
    TOTAL = NEPI + EPI,
    ERATE = round_half_up((EPI / TOTAL) * 100, 1)
  ) %>% 
  filter(
    TOTAL >= 30 
  ) %>% 
  select(YR, THCIC_ID, PROVIDER_NAME, ERATE)
  
# pivot for table
epi_rate_hosp_yr_table <- epi_rate_hosp_yr %>% 
  pivot_wider(names_from = YR, values_from = ERATE) %>% 
  arrange(`2019` %>% desc())

#peek at table
epi_rate_hosp_yr_table %>% 
  datatable()

Episiotomy rate by year for Laredo

epi_rate_hosp_yr %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))

Either Episiotomy or Cesarean

This looks to see if a mother had either a cesarean or an episiotomy. All the same things apply: All doctors, all complications, but hospitals with fewer than 120 deliveries in our time period are excluded.

epicsec_rate_hosp <- del_csec_epi_suppflag %>% 
  mutate(
        EPICSEC = if_else((EPI == T | PRCSECP) == T, T, F)
  ) %>% 
  count(THCIC_ID, PROVIDER_NAME, EPICSEC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPICSEC, values_from = CASES) %>% 
  rename(
    EPICSEC = "TRUE",
    NEPICSEC = "FALSE"
  ) %>% 
  mutate(
    TOTAL = EPICSEC + NEPICSEC,
    ECRATE = round_half_up((EPICSEC / TOTAL * 100), 1)
  ) %>% 
  filter(TOTAL >= 120) %>% 
  arrange(ECRATE %>% desc())

epicsec_rate_hosp %>% 
  datatable()

Either episiotomy or Cesarean for Laredo

epicsec_rate_hosp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))

Episiotomy or Cesarean rate by hospital by year

This uses all deliveries. Hospital excluded if fewer than 30 deliveries a year.

# prep data for table
epicsec_rate_hosp_yr <- del_csec_epi_suppflag %>% 
  mutate(
    EPICSEC = if_else((EPI == T | PRCSECP) == T, T, F)
  ) %>% 
  count(THCIC_ID, YR, PROVIDER_NAME, EPICSEC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPICSEC, values_from = CASES) %>% 
  rename(
    EPICSEC = "TRUE",
    NEPICSEC = "FALSE"
  ) %>% 
  mutate(
    TOTAL = EPICSEC + NEPICSEC,
    ECRATE = round_half_up((EPICSEC / TOTAL * 100), 1)
  ) %>% 
  filter(TOTAL >= 30)

# pivot data for table
epicsec_rate_hosp_yr_table <- epicsec_rate_hosp_yr %>% 
  select(THCIC_ID, YR, PROVIDER_NAME, ECRATE) %>% 
  pivot_wider(names_from = YR, values_from = ECRATE) %>% 
  arrange(`2019` %>% desc())

epicsec_rate_hosp_yr_table %>% 
  datatable()

Episiotomy or Cesarean rate by year for Laredo

epicsec_rate_hosp_yr_table %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo"))

Get Medicaid rate 2019

One value we don’t have yet is the percentage of patients using Medicaid.

mediciad_2019 <- del_csec_epi_suppflag %>% 
  # filter for 2019
  filter(
    YR == 2019
  ) %>% 
  # create MC col
  mutate(
    MC = if_else(FIRST_PAYMENT_SRC == "MC", T, F),
    MC = if_else(is.na(MC), F, MC)
  ) %>% 
  # group and count MC
  group_by(THCIC_ID, PROVIDER_NAME, PROVIDER_CITY) %>% 
  count(MC) %>% 
  # pivot to shape to get rate
  pivot_wider(names_from = MC, values_from = n) %>% 
  # get rate
  mutate(
    MCRATE = ((`TRUE` / (`TRUE` + `FALSE`)) * 100) %>% round_half_up(1)
  )

# peek as table
mediciad_2019 %>% datatable()

2019 by hospital summary

Building a single dataframe that has summary data by hospital. This is for the interactive. Before we can join them all, we need to pare some calcs down to 2019 only.

Get epi rate for 2019

Extrating 2019 for joining later.

epi_rate_hosp_2019 <- epi_rate_hosp_yr %>% 
  filter(
    YR == 2019
  ) %>% 
  ungroup() %>% 
  select(THCIC_ID, ERATE)

Get cesarean rate for 2019

csec_hosp_rate_2019 <- csec_hosp_rate_yr %>% 
  filter(
    YR == 2019
  ) %>% 
  ungroup() %>% 
  select(THCIC_ID, CRATE)

Quality checks

Difference when removing infrequent deliveries

In this view I remove doctors with fewer than 10 deliveries in a given year to give a more accurate look at the staff doctors that do the bulk of deliveries. Suppressed doctor records are removed.

The choice of 10 deliveries as a threshold is arbitrary and could be changed if we deem necessary.

del_hosp_doc_freq <- del_supp %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME, ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  summarize(DELIVERIES = n()) %>% 
  rename(PHYSICIAN = ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  filter(DELIVERIES >= 10) %>% 
  group_by(THCIC_ID, PROVIDER_NAME) %>% 
  summarize(
    DELIVERIES = sum(DELIVERIES),
    PHYSICIANS = n_distinct(PHYSICIAN)
  ) %>% 
  mutate(
    DRATE = round_half_up(DELIVERIES/PHYSICIANS,1)
  ) %>% 
  arrange(DRATE %>% desc())

del_hosp_doc_freq %>% 
  datatable()

Deliveries per doctor at Laredo hospitals

Delivery rate per doctor by year. Unknown doctors are suppressed. Infrequent deliveries (<20/year) are filtered out.

del_supp %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo")) %>%
  group_by(YR, THCIC_ID, PROVIDER_NAME, ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  summarize(DELIVERIES = n()) %>% 
  rename(PHYSICIAN = ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  filter(DELIVERIES >= 10) %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME) %>% 
  summarize(
    DELIVERIES = sum(DELIVERIES),
    PHYSICIANS = n_distinct(PHYSICIAN)
  ) %>% 
  mutate(
    RATE = round_half_up(DELIVERIES/PHYSICIANS, 1)
  ) %>% 
  select(YR, THCIC_ID, PROVIDER_NAME, RATE) %>% 
  pivot_wider(names_from = YR, values_from = RATE)

Just how much does doctor suppression affect these rates in Laredo? We’ll do the same calculation on the data but without suppressing unknown doctors. Note this has limitations because multiple doctors could be counted in the two suppression values.

del %>% 
  filter(str_detect(PROVIDER_NAME, "Laredo")) %>%
  group_by(YR, THCIC_ID, PROVIDER_NAME, ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  summarize(DELIVERIES = n()) %>% 
  rename(PHYSICIAN = ATTENDING_PHYSICIAN_UNIF_ID) %>% 
  filter(DELIVERIES >= 10) %>% 
  group_by(YR, THCIC_ID, PROVIDER_NAME) %>% 
  summarize(
    DELIVERIES = sum(DELIVERIES),
    PHYSICIANS = n_distinct(PHYSICIAN)
  ) %>% 
  mutate(
    RATE = round_half_up(DELIVERIES/PHYSICIANS, 1)
  ) %>% 
  select(YR, THCIC_ID, PROVIDER_NAME, RATE) %>% 
  pivot_wider(names_from = YR, values_from = RATE)

Summaries

Medicaid summary

Get the rate of medicaid deliveries statewide.

medi_rate_tx_yr_summary <- del %>% 
  mutate(
    MC = if_else(FIRST_PAYMENT_SRC == "MC", T, F),
    MC = if_else(is.na(MC), F, MC)
  ) %>% 
  count(YR, MC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = MC, values_from = CASES) %>% 
  rename(MCF = `FALSE`, MCT = `TRUE`) %>% 
  mutate(
    SUMMARY = "TX",
    CATEGORY = "MEDICARE",
    MEASUREMENT = "RATE",
    TOTAL = MCF + MCT,
    VALUE = round_half_up((MCT / TOTAL) * 100, 1) # RATE
  ) %>% 
  select(YR, SUMMARY, CATEGORY, MEASUREMENT, VALUE)

medi_rate_tx_yr_summary

Cesarean summary

Get the rate of cesearean deliveries across the state.

csec_rate_tx_yr_summary <- del_csec %>% 
  group_by(YR, PRCSECP) %>% 
  count(PRCSECP) %>% 
  rename(CASES = n) %>%
  pivot_wider(names_from = PRCSECP, values_from = CASES) %>% 
  rename(NCSEC = "FALSE", CSEC = "TRUE") %>% 
  mutate(
    SUMMARY = "TX",
    CATEGORY = "CESAREAN",
    MEASUREMENT = "RATE",
    TOTAL = NCSEC + CSEC,
    VALUE = round_half_up((CSEC / TOTAL) * 100, 1) # RATE
  ) %>% 
  select(YR, SUMMARY, CATEGORY, MEASUREMENT, VALUE)

csec_rate_tx_yr_summary

Episiotomy & Cesarean summary

Get the percentage of deliveries where either an episiotomy or cesarean were performed.

epicsec_rate_tx_yr_summary <- del_csec_epi %>% 
  mutate(
        EPICSEC = if_else((EPI == T | PRCSECP) == T, T, F)
  ) %>% 
  group_by(YR, EPICSEC) %>% 
  count(EPICSEC) %>% 
  rename(CASES = n) %>% 
  pivot_wider(names_from = EPICSEC, values_from = CASES) %>% 
  rename(
    EPICSEC = "TRUE",
    NEPICSEC = "FALSE"
  ) %>% 
  mutate(
    SUMMARY = "TX",
    CATEGORY = "EPI_OR_CSEC",
    MEASUREMENT = "RATE",
    TOTAL = EPICSEC + NEPICSEC,
    VALUE = round_half_up((EPICSEC / TOTAL * 100), 1) # RATE
  ) %>% 
  select(YR, SUMMARY, CATEGORY, MEASUREMENT, VALUE)

epicsec_rate_tx_yr_summary

Episiotomy or Cesarean rate by hospital: Averaged, by year

epicsec_rate_hosp_yr_summary <- epicsec_rate_hosp_yr %>% 
  ungroup() %>% 
  group_by(YR) %>% 
  summarize(
    SUMMARY = "HOSPITAL",
    CATEGORY = "EPI_OR_CSEC",
    MEASUREMENT = "MEAN_OF_RATE",
    # MEDIAN = median(RATE),
    VALUE = round_half_up(mean(ECRATE, na.rm = TRUE),1) # MEAN OF RATE
  )

epicsec_rate_hosp_yr_summary

Data for Interactive: Hospital blurbs

Here we start with cleaned hospitals list and combine to get the medicare rate, deliveries per week and maternal levels. Some values were calculated from data where doctors were suppressed, and so may be NA.

The maternal levels were manually collected into a spreadsheet (and updated Nov. 24, 2020) and then used here.

# Import maternal levels

maternal_url <- "https://docs.google.com/spreadsheets/d/1efUxctHiY4cCgrqqHvBVx7jZobbx5Dgmpbf_YCd7tcE/gviz/tq?tqx=out:csv"
maternal_levels <- read_csv(maternal_url) %>% 
  mutate(
    THCIC_ID = as.character(THCIC_ID)
  )
## Parsed with column specification:
## cols(
##   THCIC_ID = col_double(),
##   PROVIDER_NAME = col_character(),
##   LEVEL = col_character()
## )
blurbs_2019 <- providers_full %>% 
  rename(PROVIDER_NAME = PROVIDER_NAME_CLEANED) %>% 
  select(-PROVIDER_CITY) %>% 
  left_join(
    mediciad_2019 %>%
      ungroup() %>% 
      select(THCIC_ID, MCRATE),
    by = "THCIC_ID") %>% 
  # get deliveries per wk
  left_join(
    del_doc_hosp_2019 %>%
      ungroup() %>%
      select(THCIC_ID, DELSPERWK),
      by = "THCIC_ID"
    ) %>% 
  left_join(maternal_levels %>% select(-PROVIDER_NAME))
## Joining, by = "THCIC_ID"
blurbs_2019 %>% head(10)

Writing files and closing out

For export:

Plus summary files for another notebook.

if (test_flag == F) medi_rate_tx_yr_summary %>%
  write_rds("data-processed/medi_rate_tx_yr_summary.rds")

if (test_flag == F) csec_rate_tx_yr_summary %>%
  write_rds("data-processed/csec_rate_tx_yr_summary.rds")

if (test_flag == F) epicsec_rate_tx_yr_summary %>%
  write_rds("data-processed/epicsec_rate_tx_yr_summary.rds")

if (test_flag == F) epicsec_rate_hosp_yr_summary %>%
  write_rds("data-processed/epicsec_rate_hosp_yr_summary.rds")

if (test_flag == F) blurbs_2019 %>%
  arrange(PROVIDER_NAME) %>% 
  write_json("exports/blurbs_2019.json")

# A klaxon to indicate the processing is complete
beepr::beep(4)