This notebook attempts to aggregate Riverside County, CA testing counts in a format that will work well the aggregations provided by San Diego Count, CA.

San Diego aggregations

It’s likely that we’ll change the data shape of the above data as well. For instance, it is not tidy to have separate columns for the test result. There should be one column with the value of that result.

Here is what we will aim for with each data set:

  • date_week_end
  • zip
  • test_result
  • count

(I’m not sure if the “name” – which I have renamed “place” – will come into play here.)

But this Riverside County data is individual tests. We need to aggregate them by week. The notebook 02-time-study.Rmd figures out how to do this and we apply that result here.

library(tidyverse)
library(lubridate)

Import

This data comes from the result of 01-clean.Rmd.

riverside <- read_rds("data-processed/riverside.rds")

riverside %>% glimpse()
## Rows: 1,967,201
## Columns: 4
## $ lab_date   <date> 2020-01-04, 2020-01-04, 2020-01-04, 2020-01-04, 2020-01-0…
## $ place      <chr> "TEMECULA", "MURRIETA", "INDIO", "INDIO", "HEMET", "RIVERS…
## $ zipcode    <chr> "92591", "92563", "92203", "92203", "92543", "92503", "928…
## $ lab_result <chr> "Negative", "Negative", "Negative", "Positive", "Negative"…

Create our zipcode aggregations

weekly_cnt <- riverside %>% 
  count(
    week = ceiling_date(lab_date, "week") - 1,
    zipcode,
    lab_result
  ) %>% 
  rename(
    tests = n
  )

weekly_cnt %>% glimpse()
## Rows: 9,033
## Columns: 4
## $ week       <date> 2020-01-04, 2020-01-04, 2020-01-04, 2020-01-04, 2020-01-0…
## $ zipcode    <chr> "91752", "92203", "92203", "92211", "92223", "92503", "925…
## $ lab_result <chr> "Negative", "Negative", "Positive", "Positive", "Positive"…
## $ tests      <int> 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2…

Reshape by test results

weekly_results <- weekly_cnt %>% 
  pivot_wider(
    names_from = lab_result,
    values_from = tests
  ) %>% 
  arrange(week, zipcode)

## peek at results
weekly_results

Quick eyeball test of aggregation

Looking at first date in raw data to compare to the top of the table above.

riverside %>% 
  filter(
    lab_date == "2020-01-04"
  ) %>% 
  arrange(zipcode)

Export the file

weekly_results %>%
  write_csv("data-processed/riverside_weekly_results.csv")