Statewide Analysis

Setup

Importing the libraries I will need.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(lubridate)

Read in Clean Data

df <- read_rds("data-processed/01-storm-data.rds")

df |> glimpse()
Rows: 104,554
Columns: 12
$ EVENT_TYPE        <chr> "Winter Storm", "Winter Storm", "Winter Storm", "Win…
$ INJURIES_DIRECT   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ INJURIES_INDIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ DEATHS_DIRECT     <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ DEATHS_INDIRECT   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CZ_TYPE           <chr> "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z…
$ CZ_FIPS           <dbl> 98, 161, 159, 174, 92, 95, 157, 119, 106, 101, 103, …
$ CZ_NAME           <chr> "HASKELL", "LIMESTONE", "MCLENNAN", "MILAM", "COOKE"…
$ damage_val_prop   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ damage_val_crop   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ begin_date        <date> 2000-01-25, 2000-01-25, 2000-01-25, 2000-01-25, 200…
$ end_date          <date> 2000-01-28, 2000-01-28, 2000-01-28, 2000-01-28, 200…

Clean up data more for analysis

I am going to remove the unnecessary columns for this analysis and change the order of the columns. I am also going to create a new column that is total damages (damage_val_prop + damage_val_crop).

storms <- df |> mutate(
  event_type = EVENT_TYPE,
  location = CZ_NAME,
  FIPS = CZ_FIPS,
  CZ_type = CZ_TYPE

)

storms <- storms |> rowwise() |> mutate(
  total_damages = sum(damage_val_prop, damage_val_crop, na.rm = TRUE)
) |> select(
  event_type,
  location,
  CZ_type,
  FIPS,
  total_damages,
  damage_val_prop,
  damage_val_crop,
  begin_date
)

storms

I am going to export this version of this data for further analysis later.

storms |> write_rds("data-processed/02-storm-data.rds")

Answering Statewide Questions

What do damages from storms look like over time?

I am going to group by years and then sum up the total damages for each year. Then I arrange it so the years with the highest total damage are at the top.

yr_damages <- storms |> group_by(yr = year(begin_date)) |> 
  summarize(total_damages = sum(total_damages, na.rm = TRUE)) |> 
  arrange(total_damages |> desc())

yr_damages

Now let’s plot it.

ggplot(
  yr_damages,
  aes(x = yr, y =total_damages)
) + geom_col() +
  scale_x_continuous(breaks = seq(2000, 2023, by = 1)) +
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5))

Which storm types cause the most damage?

To answer this question, I am grouping by the storm event types (hurricane, tornado, etc.) and then summing up the total damage for each storm event type. Then I arrange the table to show the event types with the highest total damages at the top.

#now groupby event type and sum up damages
most_damage <- storms |> group_by(event_type) |> 
  summarize(total_damages = sum(total_damages, na.rm = TRUE)) |> 
  arrange(total_damages |> desc())

most_damage

Now, I want to graph each event type over time to see if there are any obvious trends. Here, I am grouping by the event_type and year and then summing up damages for each event type for each year and then looking at only the years/events that had damages.

most_damage_time <- storms |> group_by(event_type, yr = year(begin_date)) |> 
  summarize(total_damages = sum(total_damages, na.rm = TRUE)) |> 
  arrange(total_damages |> desc()) |> 
  filter(total_damages > 0)
`summarise()` has grouped output by 'event_type'. You can override using the
`.groups` argument.
most_damage_time

Now I am plotting it.

ggplot (
  most_damage_time,
  aes(x = yr, y = total_damages, color = event_type)
) + geom_line()

Which individual storms caused the most damage?

Here are the top ten storms according to total damage.

sorted_damages <- storms |> arrange(total_damages |> desc())
sorted_damages |> head(10)

Answering the same questions but for damage to property and then crops

In this section, I answer the saem questions as above, in the same way, but I do it for damages to crops and damages to property, separately instead of total damages.

What do damages to property from storms look like over time?

yr_damages_prop <- storms |> group_by(yr = year(begin_date)) |> 
  summarize(total_damages_prop = sum(damage_val_prop, na.rm = TRUE)) |> 
  arrange(total_damages_prop |> desc())

yr_damages_prop

Now let’s plot it.

ggplot(
  yr_damages_prop,
  aes(x = yr, y =total_damages_prop)
) + geom_col() +
  scale_x_continuous(breaks = seq(2000, 2023, by = 1)) +
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5))

Which storm types cause the most damage to property?

#now groupby event type and sum up damages
most_damage_prop <- storms |> group_by(event_type) |> 
  summarize(total_damages_prop = sum(damage_val_prop, na.rm = TRUE)) |> 
  arrange(total_damages_prop |> desc())

most_damage_prop

How have damages to property from different storm types varied over time?

most_damage_time_prop <- storms |> group_by(event_type, yr = year(begin_date)) |> 
  summarize(total_damages_prop = sum(damage_val_prop, na.rm = TRUE)) |> 
  arrange(yr |> desc()) |> 
  filter(total_damages_prop > 0)
`summarise()` has grouped output by 'event_type'. You can override using the
`.groups` argument.
most_damage_time_prop
ggplot (
  most_damage_time_prop,
  aes(x = yr, y = total_damages_prop, color = event_type)
) + geom_line()

Which specific storms cause the most damage to property?

sorted_damages_prop <- storms |> arrange(damage_val_prop |> desc())
sorted_damages_prop |> head(10)

What do damages from storms look like over time for crops?

yr_damages_crop <- storms |> group_by(yr = year(begin_date)) |> 
  summarize(total_damages_crop = sum(damage_val_crop, na.rm = TRUE)) |> 
  arrange(total_damages_crop |> desc())

yr_damages_crop

Now let’s plot it.

ggplot(
  yr_damages_crop,
  aes(x = yr, y =total_damages_crop)
) + geom_col() +
  scale_x_continuous(breaks = seq(2000, 2023, by = 1)) +
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5))

Which storm types cause the most damage to crops?

#now groupby event type and sum up damages
most_damage_crop <- storms |> group_by(event_type) |> 
  summarize(total_damages_crop = sum(damage_val_crop, na.rm = TRUE)) |> 
  arrange(total_damages_crop |> desc())

most_damage_crop

How have damages to crops from different storm types varied over time?

most_damage_time_crop <- storms |> group_by(event_type, yr = year(begin_date)) |> 
  summarize(total_damages_crop = sum(damage_val_crop, na.rm = TRUE)) |> 
  arrange(yr |> desc()) |> 
  filter(total_damages_crop > 0)
`summarise()` has grouped output by 'event_type'. You can override using the
`.groups` argument.
most_damage_time_crop
ggplot (
  most_damage_time_crop,
  aes(x = yr, y = total_damages_crop, color = event_type)
) + geom_line()

Which specific storms cause the most damage to crops?

sorted_damages_crop <- storms |> arrange(damage_val_crop |> desc())
sorted_damages_crop |> head(10)