Expand this to see code
library(tidyverse)
library(janitor)This notebook compiles Voting Tabulation Districts election returns from different years, processes them into state-wide results, then filters them for State Rep.
Originally created by MIG data fellow Isabella Zeff, it has since been refactored by Christian McDonald.
The data some from the Texas Legislative Council’s data portal. The documentation is available here. We used the 2024 General VTDs Election Data CSV version that includes 2012 - 2024 election data reported by 2024 primary election VTDs.
library(tidyverse)
library(janitor)Function to create totals from the county-by-county results
fun_totals_all <- function(.data){
.data |>
group_by(year, election, office, name, party, incumbent) |>
summarize(candvotes = sum(votes), .groups = "drop") |>
arrange(year, office, candvotes |> desc())
}all_files <- list.files(
"data-original/vdt-returns",
pattern = ".csv",
full.names = TRUE)
# all_filesThis makes sure that we have the main results for the time period that we are interested in (very specific to the Texas House spending analysis.) This does not take special elections into account. At one point we were missing runoff results for 2024.
main_races <- c(
"Democratic_Primary",
"Democratic_Runoff",
"Republican_Primary",
"Republican_Runoff",
"General"
)
all_files |>
as_tibble() |>
mutate(
value = str_remove(value, "data-original/vdt-returns/"),
value = str_remove(value, "_Election_Returns.csv"),
year = str_sub(value, 1, 4) |> as.numeric(),
election = str_remove(value, "^\\d{4}_")
) |>
filter(year >= 2016) |>
filter(election %in% main_races) |>
count(year, sort = T)all_raw <- all_files |>
set_names(basename) |>
map(\(x) read_csv(x, col_types = cols(.default = col_character()))) |>
list_rbind(names_to = "source") |>
clean_names()Here we use the name of the file to find the election year and name. We also turn votes into a number.
all_returns <- all_raw |>
mutate(
year = str_sub(source, 1, 4),
election = str_sub(source, 6, -22) |> str_replace_all("_", " "),
.before = county
) |>
mutate(votes = votes |> as.numeric()) |>
select(-source)
all_returns |> head()all_totals <- all_returns |>
fun_totals_all() |>
arrange(year, election, office, candvotes |> desc())
all_totals |> head()Find the state reps in the data.
rep_totals <- all_totals |>
filter(str_detect(office, "State Rep")) |>
mutate(district = parse_number(office), .after = office)I am exporting just the rep results at this point..
all_totals |>
write_rds("data-processed/01-all-totals.rds")
rep_totals |>
write_rds("data-processed/01-house-totals.rds")This is just to confirm that the reason (or at least a reason) why we don’t have results from every house district is because of unopposed races.
Of all the Texas House results we have, here are the districts for the 2024 general election.
rep_totals |>
distinct(year, election, district) |>
arrange(year, election, district) |>
filter(year == 2024, election == "General")We are missing 1, 3, 9, 11 for starters.
If we look at results for this election on ballotpedia we can see those same races did not have more than one candidate on the ballot. Even a race with a write-in made it (Dist 5.)
And then if we look at the same for the Republican Primary:
rep_totals |>
distinct(year, election, district) |>
arrange(year, election, district) |>
filter(year == 2024, election == "Republican Primary")If we compare the list above with ballotpedia’s primary election list, we see the first district missing is 3, which tracks with Cecil Bell Jr. being the only candidate. District 6 is also unopposed, etc.
For the Dems, you can look at that same list and see there was not a valid primary race for the first 18 districts. Those all had unopposed or zero candidates and the primary was canceled.
rep_totals |>
distinct(year, election, district) |>
arrange(year, election, district) |>
filter(year == 2024, election == "Democratic Primary")