library(tidyverse)
library(janitor)
Quality checks
Because we are extracting this data from PDF’s I used this file to troubleshoot and make sure all this is good.
Fixed issues
- In the Cleaning roster notebook the
roster_type
== “SUPPLEMENTAL SPOT 31” was changed to just “SUPPLEMENTAL SPOT” for consistency. The roster has the “31” in all cases but seems odd to keep here. I can change later if needed. - In
others
there were initially some missing players. I ended up piecing everything together in Cleaning other. - Also in
others
I had to rework how players were awarded different types and notes because in the original list players can be listed more than once with different designations. I had to collapse all of that.
There aren’t any known issues as of now, but I’m keeping this notebook around for now.
Setup
Import the rosters
<- read_rds("data-processed/rosters.rds")
rosters
|> glimpse() rosters
Rows: 868
Columns: 8
$ club_short <chr> "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "A…
$ club <chr> "Atlanta United", "Atlanta United", "Atlanta United…
$ roster_type <chr> "SENIOR ROSTER", "SENIOR ROSTER", "SENIOR ROSTER", …
$ name <chr> "Luis Abram", "Thiago Almada", "Josh Cohen", "Giorg…
$ roster_designation <chr> "TAM Player", "Young Designated Player", NA, "Desig…
$ current_status <chr> NA, NA, NA, NA, NA, NA, NA, "Unavailable - On Loan"…
$ contract_thru <chr> "2025", "2026", "2025", "2025", "2027", "2024", "20…
$ option_years <chr> "2026", NA, "2026", "2026", "2028", "2025", "2025",…
Do I have all the teams?
|>
rosters count(club_short)
There are 29 teams in the MLS as of May 2, 2024.
Let’s spot check some teams.
|> filter(club_short == "PHI") rosters
Others
We look at the others file here.
<- read_rds("data-processed/others.rds")
others
|> glimpse() others
Rows: 337
Columns: 11
Groups: club_short [29]
$ club_short <chr> "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", …
$ name <chr> "Aiden McFadden", "Bartosz Slisz", "Edwin Mosquera", "Er…
$ type_dp <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, F…
$ type_u22 <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FA…
$ type_int <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE…
$ type_inj <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ type_una <lgl> TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FAL…
$ notes_young <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
$ notes_unavail <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, F…
$ notes_notam <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, F…
$ notes_can <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
Missing teams
At one point we were missing teams. Here we make sure there are 29.
|>
others count(club_short)
Checking season-ending list
I noticed this because we were missing injured players. Here I check for them.
|>
others filter(type_inj == TRUE)
Profiles check
The last check of everything
<- read_rds("data-processed/profiles.rds")
profiles
|> filter(club_short == "CLB") profiles
Example for index page
|> filter(club_short == "ATX") |>
profiles head(2) |> glimpse()
Rows: 2
Columns: 17
$ club_short <chr> "ATX", "ATX"
$ club <chr> "Austin FC", "Austin FC"
$ roster_type <chr> "SENIOR ROSTER", "SENIOR ROSTER"
$ name <chr> "Guilherme Biro", "Julio Cascante"
$ roster_designation <chr> NA, "TAM Player"
$ current_status <chr> NA, NA
$ contract_thru <chr> "2026", "2025"
$ option_years <chr> "2027", "2026"
$ type_dp <lgl> FALSE, FALSE
$ type_u22 <lgl> FALSE, FALSE
$ type_int <lgl> TRUE, FALSE
$ type_inj <lgl> FALSE, FALSE
$ type_una <lgl> FALSE, FALSE
$ notes_young <lgl> FALSE, FALSE
$ notes_unavail <lgl> FALSE, FALSE
$ notes_notam <lgl> FALSE, FALSE
$ notes_can <lgl> FALSE, FALSE
How many teams
|>
profiles count(club_short)
Stray header?
if you do a sort by roster designation, you’ll see a stray | by a U22 designated player at the top, but otherwise, it reveals DPs first in a list you can then secondarily sort by team, which is muy helpful.
|> filter(type_u22 == T) |>
profiles filter(name == "Maximiliano David Ayala")