Texas Death Row

Goals

To scrape some data from a couple of urls on the Texas Department of Criminal Justice website.

Yet another example using rvest.

Setup

library(tidyverse)
library(janitor)
library(rvest)

Working through the exercise

Executed Offenders

Get the HTML tables from the page

# gets the tables from the page as a list
executed_tables <- read_html("https://www.tdcj.texas.gov/death_row/dr_executed_offenders.html") |> 
  html_table()

# selects the first table from the list and cleans headers
executed_raw <- executed_tables[[1]] |> clean_names()

executed_raw

Do the same for the deathrow table.

deathrow_tables <- read_html("https://www.tdcj.texas.gov/death_row/dr_offenders_on_dr.html") |> 
  html_table()

deathrow_raw <- deathrow_tables[[1]] |> clean_names()

deathrow_raw

Export the files

executed_raw |> write_rds("data-raw/tdcj/executed_raw.rds")
deathrow_raw |> write_rds("data-raw/tdcj/deathrow_raw.rds")