Trial: Sums
Welcome, Padawan. In this exercise, you’ll be tested on skills using math-based summarize() functions. Like our previous Trials, J-327D will be asking for information from the dataset starwars in the tidyverse packages.
If you recall from our earlier exercise, starwars is a dataset with 87 Star Wars characters.
Select and filter
Your droid editor, J-327D, notes that Chewbacca is taller than the rest of the crew in the Millennium Falcon, but he wonders if he is just tall for a Wookiee. He asks you to find the average heights of Humans, Droids, Gungans and Wookiees from within the starwars data. He notes that that height variable is a measurement in centimeters.
J-327D has a specific goal in mind for this first part of the Trial. He says it would be helpful to focus on some specific data before we find the average height.
- Select only the
name,height,speciesvariables from the starwars data. - Filter that to include only rows with species of
"Human", "Droid", "Gungan", "Wookiee". - Save the results into a new data frame called
charactersand then print out the newcharactersdata frame.
characters <- starwars |>
select(name, height, species) |>
filter(species %in% c("Human", "Droid", "Gungan", "Wookiee"))
characters
characters <- starwars |>
select(name, height, species) |>
filter(species %in% c("Human", "Droid", "Gungan", "Wookiee"))
charactersStart with your select. Fill in the blanks
starwars |>
select(name, height, species)J-327D notes there are several ways to filter for more than one species, but his favorite, most efficient way is through using %in%:
starwars |>
select(name, height, species) |>
filter(species ____ c("Human", "Droid", "Gungan", "Wookiee"))Finding the mean
OK, now that we have a dataframe with our selected rows based on species, we can find the answer to the question J-327D really wants to know.
In our last JedR Trial, we counted the number of characters (rows) that were different species. In this trail we need to use group_by() and summarize() and arrange() again, but we must summarize to get the average height – or mean() – of each species in the data.
Do note:
- If you attempt to match rows containing
NAvalues, the result will also beNAby default. To address this, we are suingna.rm = TRUEto exclude rows with NA values. - We are calling the new summarized variable
height_avgso our grade checker knows what to look for. - Don’t forget that we often want to sort the
height_avgby descending order, because journalists are typically interested in the most of something.
Question for this exercise
J-327D notes that we have a saying for a typical summarize function: GSA stands for group_by(), summarize() and arrange(). They typicaly come together in that order.
characters |>
group_by(species) |>
summarize(height_avg = mean(height, na.rm = TRUE)) |>
arrange(desc(height_avg))
characters |>
group_by(species) |>
summarize(height_avg = mean(height, na.rm = TRUE)) |>
arrange(desc(height_avg))JedR challenge
What are the tallest species in the Star Wars Universe?
Wookiees are the tallest species with an average height of 231 centemeters. That’s about 7 feet, 7 inches.
You’re done with this Trial
Not even the Death Star could stand in your way. Please inform your JedR Master that you have completed this trial. You are free to attempt the next JedR Trial or continue with JedR Training.