CleaningEnrollment

library(tidyverse)
library(janitor)
enrollment_file_list <- list.files(
  "data-raw/enrollment",
  pattern = ".csv",
  full.names = TRUE
)

enrollment_file_list
 [1] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2014-2015.csv"
 [2] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2015-2016.csv"
 [3] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2016-2017.csv"
 [4] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2017-2018.csv"
 [5] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2018-2019.csv"
 [6] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2019-2020.csv"
 [7] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2020-2021.csv"
 [8] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2021-2022.csv"
 [9] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2022-2023.csv"
[10] "data-raw/enrollment/Enrollment Report_Statewide_Districts_Gender_2023-2024.csv"
importing_cleaning <- function(file_name) {
  df <- file_name |>  
    read_csv(     #read csv first and skip the TEA heading lines
      skip = 4) |> 
    clean_names() |>  # clean names to make the names uniform
    mutate(enrollment = as.numeric(enrollment))  # change the column enrollment to a charcater for joining later
}

enrollment_all <- enrollment_file_list |> 
  map(importing_cleaning) |>  #map on our new function
  list_rbind()
Rows: 2440 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Year, Region, County Name, District Name, District Number, Charter ...
dbl (1): Enrollment

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2416 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Year, Region, County Name, District Name, District Number, Charter ...
dbl (1): Enrollment

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2408 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Year, Region, County Name, District Name, District Number, Charter ...
dbl (1): Enrollment

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2402 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Year, Region, County Name, District Name, District Number, Charter ...
dbl (1): Enrollment

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2406 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): YEAR, REGION, COUNTY NAME, DISTRICT NUMBER, DISTRICT NAME, GENDER, ...
dbl (1): ENROLLMENT

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 2407 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): YEAR, REGION, COUNTY NAME, DISTRICT NUMBER, DISTRICT NAME, GENDER, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `enrollment = as.numeric(enrollment)`.
Caused by warning:
! NAs introduced by coercion
Rows: 2411 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): YEAR, REGION, COUNTY NAME, DISTRICT NUMBER, DISTRICT NAME, CHARTER ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `enrollment = as.numeric(enrollment)`.
Caused by warning:
! NAs introduced by coercion
Rows: 2417 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): YEAR, REGION, COUNTY NAME, DISTRICT NUMBER, DISTRICT NAME, CHARTER ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `enrollment = as.numeric(enrollment)`.
Caused by warning:
! NAs introduced by coercion
Rows: 2421 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): YEAR, REGION, COUNTY NAME, DISTRICT NUMBER, DISTRICT NAME, CHARTER ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `enrollment = as.numeric(enrollment)`.
Caused by warning:
! NAs introduced by coercion
Rows: 2417 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): YEAR, REGION, COUNTY NAME, DISTRICT NUMBER, DISTRICT NAME, CHARTER ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `enrollment = as.numeric(enrollment)`.
Caused by warning:
! NAs introduced by coercion
enrollment_all |> tail(500)
enrollment_by_district <- enrollment_all |> group_by(district_name, year) |> 
  summarize(total_enrollment = sum(enrollment, na.rm = T)) 
`summarise()` has grouped output by 'district_name'. You can override using the
`.groups` argument.
enrollment_by_district 
enrollment_by_district |> write_rds("data-processed/enrollment-by-district.rds")