Trial: Sums
Welcome, Padawan. In this exercise, you’ll be tested on skills using math-based summarize()
functions. Like our previous Trials, J-327D will be asking for information from the dataset starwars
in the tidyverse
packages.
If you recall from our earlier exercise, starwars
is a dataset with 87 Star Wars characters.
Select and filter
Your droid editor, J-327D, notes that Chewbacca is taller than the rest of the crew in the Millennium Falcon, but he wonders if he is just tall for a Wookiee. He asks you to find the average heights of Humans, Droids, Gungans and Wookiees from within the starwars data. He notes that that height variable is a measurement in centimeters.
J-327D has a specific goal in mind for this first part of the Trial. He says it would be helpful to focus on some specific data before we find the average height.
- Select only the
name
,height
,species
variables from the starwars data. - Filter that to include only rows with species of
"Human", "Droid", "Gungan", "Wookiee"
. - Save the results into a new data frame called
characters
and then print out the newcharacters
data frame.
characters <- starwars |>
select(name, height, species) |>
filter(species %in% c("Human", "Droid", "Gungan", "Wookiee"))
characters
<- starwars |>
characters select(name, height, species) |>
filter(species %in% c("Human", "Droid", "Gungan", "Wookiee"))
characters
Start with your select. Fill in the blanks
|>
starwars select(name, height, species)
J-327D notes there are several ways to filter for more than one species, but his favorite, most efficient way is through using %in%
:
|>
starwars select(name, height, species) |>
filter(species ____ c("Human", "Droid", "Gungan", "Wookiee"))
Finding the mean
OK, now that we have a dataframe with our selected rows based on species, we can find the answer to the question J-327D really wants to know.
In our last JedR Trial, we counted the number of characters (rows) that were different species. In this trail we need to use group_by()
and summarize()
and arrange()
again, but we must summarize to get the average height – or mean()
– of each species in the data.
Do note:
- If you attempt to match rows containing
NA
values, the result will also beNA
by default. To address this, we are suingna.rm = TRUE
to exclude rows with NA values. - We are calling the new summarized variable
height_avg
so our grade checker knows what to look for. - Don’t forget that we often want to sort the
height_avg
by descending order, because journalists are typically interested in the most of something.
Question for this exercise
J-327D notes that we have a saying for a typical summarize function: GSA stands for group_by()
, summarize()
and arrange()
. They typicaly come together in that order.
characters |>
group_by(species) |>
summarize(height_avg = mean(height, na.rm = TRUE)) |>
arrange(desc(height_avg))
|>
characters group_by(species) |>
summarize(height_avg = mean(height, na.rm = TRUE)) |>
arrange(desc(height_avg))
JedR challenge
What are the tallest species in the Star Wars Universe?
Wookiees are the tallest species with an average height of 231 centemeters. That’s about 7 feet, 7 inches.
You’re done with this Trial
Like a true Jedi, you face challenges with courage and wisdom. Please inform your JedR Master that you have completed this trial. You are free to attempt the next JedR Trial or continue with JedR Training.