All indoor logs cleaning

Author

Media Innovation Group

We have recorded indoor temperatures for each unit April through September of 2022, 2023, 2024 and 2025. This information was acquired by Lauren McGaughy at KUT through a public information request to the Texas Department of Criminal Justice.

The original PDFs were converted into xlsx files and downloaded as csv files using Adobe Acrobat, resulting in the csv files found in ‘data-original/indoor_all_csv’ as ‘SB1R56 - April 2022 (Table 1).csv’ and so on for each month and year. An exception to this is the data for July-September of 2025, which were originally provided in xlsx form and follow the naming convention ‘07.2025 - SB1R56.xlsx’ and so on. These spreadsheets and all other original documents are found in ‘data-original’ under ‘2022_indoor’, ‘2023_indoor’ and so on for each year.

Goals of this notebook

We have already cleaned the data for each year individually in separate notebooks. In this notebook, we will consolidate the cleaning code for each year of data and combine all the years of data into one data frame. This data comes from a public information request to the Texas Department of Criminal Justice. We have a separate csv file for each month from April to September, so we will import each in individually and pipe it into the function we created above. In 2025, the data was given to us in spreadsheet form for July through September, which we will read in using read_excel and follow the same process above.

Clean and combine the 2022, 2023, 2024 and 2025 indoor logs for April, May, June, July, August and September.

Setup

Expand this to see code
library(tidyverse)
library(janitor)
library(lubridate)
library(readxl)

show_col_types = FALSE

Creating a function

This function streamlines the process of pivoting the data longer, skipping the first row, creating a real date and renaming/selecting columns for each month of data.

Expand this to see code
pivot_date <- function(df, mo, yr) {
  df |> pivot_longer(
    !1
  ) |> 
  mutate(
    date = mdy(paste(mo, name, yr))
  ) |> select(
    unit = 1,
    date,
    temperature = value
  )
}

*Note: this function leaves a row for each day of the month where unit = “Month Year” and temperature = NA. For the 2022 logs only, unit = “Month Year - Daily Temperature Log”. i.e. for April 2023, there are 30 rows where the unit is “April 2023” and the temperature is NA, one for each day of the month. The only month where this does not apply is April 2022 because of the different way it was read in.

2022 indoor logs

**Note: For the April 2022 csv only, we do not need to skip the first row because for some reason it converted differently when we put the original pdf into Adobe Acrobat. For all other logs, we will skip the first row in the read_csv() function to avoid formatting problems.

Downloading data and applying function

Expand this to see code
april_2022_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - April 2022 (1)(Table 1).csv") |> pivot_date("April", "2022") |> filter(!unit == "April 2022 - Daily Temperature Log", !is.na(unit)) 
New names:
Rows: 69 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
april_2022_clean 
Expand this to see code
may_2022_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - May 2022(Table 1).csv", 
         skip = 1) |> pivot_date("May", "2022") |> filter(!unit == "May 2022 - Daily Temperature Log", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
june_2022_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - June 2022(Table 1).csv", skip = 1) |> pivot_date("June", "2022") |> filter(!unit == "June 2022 - Daily Temperature Log", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
july_2022_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - July 2022(Table 1).csv", skip = 1) |> pivot_date("July", "2022") |> filter(!unit == "July 2022 - Daily Temperature Log", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
august_2022_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - August 2022(Table 1).csv", skip = 1) |> pivot_date("August", "2022") |> filter(!unit == "August 2022 - Daily Temperature Log",!is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
september_2022_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - September 2022(Table 1).csv", skip = 1) |> pivot_date("September", "2022") |> filter(!unit == "September 2022 - Daily Temperature Log",!is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

Combining each month into one data frame

Expand this to see code
indoor_2022_combined <- bind_rows(april_2022_clean, may_2022_clean, june_2022_clean, july_2022_clean, august_2022_clean, september_2022_clean)

2023 indoor logs

**Note: There are two dates for units in 2023 where the temperature value is NA:

  • Huntsville 2023-08-25
  • Segovia 2023-06-01

For these dates and units, there was no value included the original pdfs from TDCJ (the cells are blank).

Downloading data and applying function

Expand this to see code
april_2023_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - April 2023(Table 1).csv", skip = 1) |> pivot_date("April", "2023") |> filter(!unit == "April 2023", !is.na(unit))
New names:
Rows: 69 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
may_2023_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - May 2023(Table 1).csv", 
         skip = 1) |> pivot_date("May", "2023") |> filter(!unit == "May 2023", !is.na(unit))
New names:
Rows: 69 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

**Important note: we manually removed an extra space from cell Young:26 in the original May 2023 csv because it was causing the column to read as characters instead of doubles.

Expand this to see code
june_2023_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - June 2023(Table 1).csv", skip = 1) |> pivot_date("June", "2023") |> filter(!unit == "June 2023", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

**Important note: we manually removed an extra space from cell Murray:30 in the original June 2023 csv because it was causing the column to read as characters instead of doubles.

Expand this to see code
july_2023_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - July 2023(Table 1).csv", skip = 1) |> pivot_date("July", "2023") |> filter(!unit == "July 2023", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
august_2023_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - August 2023(Table 1).csv", skip = 1) |> pivot_date("August", "2023") |> filter(!unit == "August 2023", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
september_2023_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - September 2023(Table 1).csv", skip = 1) |> pivot_date("September", "2023") |> filter(!unit == "September 2023", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

Combining each month into one data frame

Expand this to see code
indoor_2023_combined <- bind_rows(april_2023_clean, may_2023_clean, june_2023_clean, july_2023_clean, august_2023_clean, september_2023_clean) 

2024 indoor logs

april_2024_clean |> filter(unit == “April 2024”)

april_2024_clean |> filter(is.na(unit))

Downloading data and applying function

Expand this to see code
april_2024_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - April 2024(Table 1).csv", skip = 1) |> pivot_date("April", "2024") |> filter(!unit == "April 2024", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
may_2024_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - May 2024(Table 1).csv", 
         skip = 1) |> pivot_date("May", "2024") |> filter(!unit == "May 2024", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
june_2024_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - June 2024(Table 1).csv", skip = 1) |> pivot_date("June", "2024") |> filter(!unit == "June 2024", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

**Important note: we manually removed an apostrophe from cell Memorial:22 in the original June 2024 csv because it was causing the column to read as characters instead of doubles.

Expand this to see code
july_2024_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - July 2024(Table 1).csv", skip = 1) |> pivot_date("July", "2024") |>  filter(!unit == "July 2024", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
august_2024_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - August 2024(Table 1).csv", skip = 1) |> pivot_date("August", "2024") |> filter(!unit == "August 2024", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
september_2024_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - September 2024(Table 1).csv", skip = 1) |> pivot_date("September", "2024") |> filter(!unit == "September 2024", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

**Important note: we manually removed an extra space from Connally:21 and 22 in the original September 2024 csv because it was causing the column to read as characters instead of doubles.

Combining each month into one data frame

Expand this to see code
indoor_2024_combined <- bind_rows(april_2024_clean, may_2024_clean, june_2024_clean, july_2024_clean, august_2024_clean, september_2024_clean) 

2025 indoor logs

Downloading data and applying function

Expand this to see code
april_2025_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - April 2025(Table 1).csv", skip = 1) |> pivot_date("April", "2025") |> filter(!unit == "April 2025", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
may_2025_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - May 2025(Table 1) (1).csv", skip = 1) |> pivot_date("May", "2025") |> filter(!unit == "May 2025", !is.na(unit))
New names:
Rows: 70 Columns: 32
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (31): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Expand this to see code
june_2025_clean <- read_csv("data-original/indoor_all_csv/SB1R56 - June 2025(Table 1).csv", skip = 1) |> pivot_date("June", "2025") |> filter(!unit == "June 2025", !is.na(unit))
New names:
Rows: 70 Columns: 31
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): ...1 dbl (30): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

**Important note: we manually removed an extra space from Bartlett:8 in the original June 2025 csv because it was causing the column to read as characters instead of doubles.

Expand this to see code
july_2025_clean <- read_excel("data-original/2025_indoor/07.2025 - SB1R56.xlsx", skip = 1, range = "A2:AF70") |> pivot_date("July", "2025") |> filter(!unit == "July 2025", !is.na(unit))
New names:
• `` -> `...1`
Expand this to see code
august_2025_clean <- read_excel("data-original/2025_indoor/08.2025 - SB1R56.xlsx", skip = 1, range = "A2:AF70") |> pivot_date("August", "2025") |> filter(!unit == "August 2025", !is.na(unit))
New names:
• `` -> `...1`
Expand this to see code
september_2025_clean <- read_excel("data-original/2025_indoor/09.2025 - SB1R56.xlsx", skip = 1, range = "A2:AE70") |> pivot_date("September", "2025") |> filter(!unit == "September 2025", !is.na(unit))
New names:
• `` -> `...1`

**Important note: we manually removed an extra space from cell O58 (Stevenson:14) in the original September 2025 csv because it was causing the column to read as characters instead of doubles.

Combining each month into one data frame

Expand this to see code
indoor_2025_combined <- bind_rows(april_2025_clean, may_2025_clean, june_2025_clean, july_2025_clean, august_2025_clean, september_2025_clean)

Combining all years of indoor data into one data frame

Expand this to see code
indoor_temps_all_combined <- bind_rows(indoor_2022_combined, indoor_2023_combined, indoor_2024_combined, indoor_2025_combined) 

indoor_temps_all_combined |> head(50)

Export

Expand this to see code
indoor_temps_all_combined |> write_rds("data-processed/01-indoor-temps-all-cleaned.rds")