tibble::tribble(
~year, ~district, ~name, ~party, ~candvotes,
"2016", 46, "Dukes", "D", 37457,
"2016", 46, "Greeley", "G", 2178,
"2016", 46, "Ludlow", "L", 3445,
"2016", 46, "Nila", "R", 10209 ) Margins prompt
To figure out the top candidate and the runner up, I turned to ChatGPT. This is the prompt and return.
Prompt
I have the following data in R:
And I want new columns that do the following:
- the name of the winning candidate (highest candvotes value).
- the percentage of the total vote that the winning candidate received
- the name of the second highest candidate (using candvotes)
- the percentage of the total vote that second candidate got
- the difference betwen the two percentages
The result
You can do this cleanly with dplyr by working within each year–district, ranking candidates by votes, and then summarizing.
Here’s a tidyverse solution that gives one row per race with all the fields you want.
library(dplyr)
df <- tibble::tribble(
~year, ~district, ~name, ~party, ~candvotes,
"2016", 46, "Dukes", "D", 37457,
"2016", 46, "Greeley", "G", 2178,
"2016", 46, "Ludlow", "L", 3445,
"2016", 46, "Nila", "R", 10209
)
result <- df %>%
group_by(year, district) %>%
mutate(
total_votes = sum(candvotes),
pct = candvotes / total_votes * 100
) %>%
arrange(desc(candvotes), .by_group = TRUE) %>%
summarise(
winner = first(name),
winner_pct = first(pct),
runner_up = nth(name, 2),
runner_up_pct = nth(pct, 2),
pct_diff = winner_pct - runner_up_pct,
.groups = "drop"
)
resultOutput
# A tibble: 1 × 7
year district winner winner_pct runner_up runner_up_pct pct_diff
<chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 2016 46 Dukes 70.2 Nila 19.2 51.0
Why this works
group_by(year, district)→ treats each race separatelytotal_votes→ denominator for percentagesarrange(desc(candvotes), .by_group = TRUE)→ ensures winners are firstfirst()andnth(2)→ safely grab top two finisherspct_diff→ margin of victory in percentage points
Optional: keep party labels too
If you want winner/runner-up party as well:
summarise(
winner = first(name),
winner_party = first(party),
winner_pct = first(pct),
runner_up = nth(name, 2),
runner_up_party = nth(party, 2),
runner_up_pct = nth(pct, 2),
pct_diff = winner_pct - runner_up_pct,
.groups = "drop"
)If you want this to scale to multi-year, multi-district election data, this pattern will hold without modification.