Expand this to see code
totals_rep |>
filter(year == 2024, str_detect(name, "Phelan"))We are trying to make a list of all the Texas House candidates since 2016 and list if they won their elections.
Something to know about these election results is it will NOT include candidates that were unopposed in an election. Let’s look at some examples:
We can look to Ballotpedia in 2024 to see why
So there are NO results for District 3 in 2024
totals_rep |>
filter(year == 2024, str_detect(name, "Phelan"))In 2024, Dade Phelan was in a Republican Primary Runoff. He was not originally reflected in the data because we were missing 2024 runoff data from both parties. That has been remedied.
Here we’ll go through each election and we’ll append the race totals and add a flag if the candidate won that race (based on >50% vote.)
totals_win <-
totals_rep |>
group_by(year, office, district, election) |>
mutate(
total_votes = sum(candvotes),
# winner = candidate has > 50% of vote AND the max votes
winner = candvotes == max(candvotes) & candvotes > total_votes / 2,
) |>
ungroup()
totals_win |> nrow()[1] 3029
totals_win |> head()I should export that tibble.
Peek at the list of elections:
totals_win |>
count(year, election)Here we start with our Texas House project specific tasks:
elcts <- c(
"Democratic Primary",
"Democratic Runoff",
"Republican Primary",
"Republican Runoff",
"General"
)
totals_prg <-
totals_win |>
filter(election %in% elcts, year >= 2016) |>
mutate(election = election |> fct(levels = elcts))
print("Showing years")[1] "Showing years"
totals_prg |> count(year)print("Showing included elections")[1] "Showing included elections"
totals_prg |> count(election)While working through this project I found that a candidate’s incumbent flag can change during the same election cycle and there doesn’t seem to be a pattern. Through research I found these to mostly be errors. If left along I would not be able to group on incumbent but that would create other problems as there are two candidates with the same last name in the same Republican Primary and incumbency is the only way to tell them apart.
The only solution I can think of now is to make corrections to these records. We’ll use the Texas Secretary of State results to make decisions. Here are the records in question:
totals_prg |>
group_by(year, office, district, name, party) |>
summarise(n_inc = n_distinct(incumbent), .groups = "drop") |> # counts if different values
filter(n_inc > 1)This block will be our exploratory block.
totals_rep |>
filter(year == 2020, district == 28, name == "Gates")totals_inc_fix <- totals_prg |>
mutate(
incumbent = case_when(
year == 2016 & district == 27 & incumbent == "N" & name == "Reynolds" ~ "Y",
year == 2016 & district == 73 & incumbent == "N" & name == "Miller" ~ "Y",
year == 2016 & district == 92 & incumbent == "N" & name == "Stickland" ~ "Y",
year == 2018 & district == 13 & incumbent == "Y" & name == "Leman" ~ "N",
year == 2020 & district == 100 & incumbent == "N" & name == "Birabil" ~ "Y",
year == 2020 & district == 148 & incumbent == "N" & name == "Eastman" ~ "Y",
year == 2020 & district == 28 & incumbent == "N" & name == "Gates" ~ "Y",
.default = incumbent
)
)
totals_inc_fix |>
group_by(year, office, district, name, party) |>
summarise(n_inc = n_distinct(incumbent), .groups = "drop") |> # counts if different values
filter(n_inc > 1)We are now left with just the Simmons pair from District 65, which are two different people.
For each election (filtered) we need to:
I should probably write a function for the filter and renaming, but I don’t want to expend the brain power at this time.
dp_data <-
totals_inc_fix |>
filter(election == "Democratic Primary") |>
rename(dp_votes = candvotes,
dp_tvotes = total_votes,
dp_win = winner) |>
select(-election)
dp_data |> nrow()[1] 396
dp_data |> head()dr_data <-
totals_inc_fix |>
filter(election == "Democratic Runoff") |>
rename(# dr_inc = incumbent,
dr_votes = candvotes,
dr_tvotes = total_votes,
dr_win = winner) |>
select(-election)
dr_data |> nrow()[1] 60
dr_data |> head()rp_data <-
totals_inc_fix |>
filter(election == "Republican Primary") |>
rename(# rp_inc = incumbent,
rp_votes = candvotes,
rp_tvotes = total_votes,
rp_win = winner) |>
select(-election)
rp_data |> nrow()[1] 590
rp_data |> head()rr_data <-
totals_inc_fix |>
filter(election == "Republican Runoff") |>
rename(# rr_inc = incumbent,
rr_votes = candvotes,
rr_tvotes = total_votes,
rr_win = winner) |>
select(-election)
rr_data |> nrow()[1] 100
rr_data |> head()Now the general election
g_data <-
totals_inc_fix |>
filter(election == "General") |>
rename(# g_inc = incumbent,
g_votes = candvotes,
g_tvotes = total_votes,
g_win = winner) |>
select(-election)
g_data |> head()Let’s try a join.
totals_flags <-
dp_data |>
full_join(dr_data) |>
full_join(rp_data) |>
full_join(rr_data) |>
full_join(g_data)Joining with `by = join_by(year, office, district, name, party, incumbent)`
Joining with `by = join_by(year, office, district, name, party, incumbent)`
Joining with `by = join_by(year, office, district, name, party, incumbent)`
Joining with `by = join_by(year, office, district, name, party, incumbent)`
totals_flags |> nrow()[1] 1723
totals_flags |> head()totals_export <- totals_flags |>
select(!ends_with("votes"))
totals_export |> glimpse()Rows: 1,723
Columns: 11
$ year <chr> "2016", "2016", "2016", "2016", "2016", "2016", "2016", "201…
$ office <chr> "State Rep 110", "State Rep 110", "State Rep 116", "State Re…
$ district <dbl> 110, 110, 116, 116, 116, 118, 118, 120, 120, 120, 120, 120, …
$ name <chr> "Rose", "Crenshaw", "Arevalo", "Golando", "Resendez", "Urest…
$ party <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", …
$ incumbent <chr> "Y", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", …
$ dp_win <lgl> TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, …
$ dr_win <lgl> NA, NA, NA, NA, NA, NA, NA, NA, TRUE, FALSE, NA, NA, NA, NA,…
$ rp_win <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ rr_win <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ g_win <lgl> NA, NA, NA, NA, NA, TRUE, NA, NA, TRUE, NA, NA, NA, NA, NA, …
Filtering by election and then joining all those back is pretty major. Let’s make sure our result has what we expect.
# Individuals before we split apart and rejoined
totals_inc_fix |> distinct(year, office, district, name, office, party, incumbent) |> nrow()[1] 1723
# number of rows in our flagged data
totals_export |> nrow()[1] 1723
We’ll pick one our districts where incumbency was wrong and compare our original data to when we created totals and to our final product.
# our source data
totals_rep |>
filter(year == 2016, district == 27)# with winners
totals_win |>
filter(year == 2016, district == 27)# our processed data
totals_export |>
filter(year == 2016, district == 27)And then we have our problem child (or children) with the 2018 District 65 Republican Primary race with two people named Simmons.
# our source data
totals_rep |>
filter(year == 2018, district == 65)# with winners
totals_win |>
filter(year == 2018, district == 65)# our processed data
totals_export |>
filter(year == 2018, district == 65)Beckley shows no dp_win because she was unopposed in that election and therefor not in our data.
Could there be other potential issues if person of the same name ran in the same race.
totals_prg |>
count(year, election, office, district, name, sort = T) |>
filter(n > 1)totals_prg |>
filter(district == 41, name == "Guerra")There is another case in District 41 where two people named Guerra ran against each other in the General elections of 2020, 2022 and 2024. They are in different parties.
We’ll peek at the 2022 results.
# our source data
totals_rep |>
filter(year == 2022, district == 41)# with winners
totals_win |>
filter(year == 2022, district == 41)# our processed data
totals_export |>
filter(year == 2022, district == 41)I think this is the best we can do for now. I’m still worried that we have only last names and that will prove a problem with Simmons and Guerra when we try to join with ballot and spending, but it is the best we have for right now.
totals_export |> write_csv("data-export/02-house-winners.csv")
totals_export |> write_rds("data-processed/02-house-winners.rds")