6  Military Surplus Analysis

This chapter is by Prof. McDonald, who uses macOS.

In the last chapter, we learned about the LESO data … that local law enforcement agencies can get surplus military equipment from the U.S. Department of Defense. We downloaded a combined version of the data, modified it for our purposes (used mutate() to create a new column calculated from other variables in the data) and then filtered it to a specific time period.

Throughout this lesson I’ll give you a chance to work out the code on your own, but then give you the answer and explain it. You won’t learn this if you just copy/paste. You need more than code that runs; you need to understand what the code is doing.

6.1 Learning goals of this lesson

In this chapter we will start querying the data using summarize with math, basically adding values in a column instead of counting rows, which we did with the Billboard assignment.

Our learning goals are:

  • To use the combination of group_by(), summarize() and arrange() to add columns of data using sum().
  • To use different group_by() groupings in specific ways to get desired results.
  • To practice using filter() on those summaries to better see certain results, including filtering in a vector.
  • We’ll write Data about some of the findings, practicing data-centric ledes and sentences describing data.

6.2 Questions to answer

All answers should be based on controlled items given to Texas agencies from Jan. 1, 2010 to present.

  • How many total “controlled” items were transferred to Texas agencies, and what are they all worth? We’ll summarize all the controlled items only to get the total quantity and total value of everything.
  • How many total “controlled” items did each agency get and how much was it all worth? Which agency got the most stuff?
    • How about local police agencies? What was the total quantity and value I’ll give you a list.
  • What specific “controlled” items did each agency get and how much were they worth? Now we’re looking at the kinds of items.
    • What did local agencies get?

You’ll research some of the more interesting items the agencies received so you can include them in your data drop.

6.3 Set up the analysis notebook

Before we get into how to do this, let’s set up our analysis notebook.

  1. Make sure you have your military surplus project open in RStudio. If you have your import notebook open, close it and use Run > Restart R and Clear Output.
  2. Create a new Quarto Document with the title “Texas Analysis”.
  3. Save the notebook at 02-analysis.qmd.
  4. Update your _quarto.yml file to add the 02-analysis.qmd to your navigation.
  5. Remove the boilerplate text.
  6. Create a setup section (headline, text and code chunk) that loads the tidyverse library. Include the same execution options we had in the cleaning notebook.

We’ve started each notebook like this, so you should be able to do this on your own now. That said, here is the setup chunk with options:

```{r}
#| label: setup
#| message: false
#| warning: false

library(tidyverse)
```

6.3.1 Load the data into a tibble

  1. Next create an import section (headline, text and chunk) that loads the data from the previous notebook and save it into a tibble called leso.
  2. Add a glimpse() of the data for your reference.

We did this in Billboard and you should be able to do it. You’ll use read_rds() and find your data in your data-processed folder.

Import code
leso <- read_rds("data-processed/01-leso-all.rds")

leso |> glimpse()

You should see the leso object in your Environment tab. You also have the glimpse like above so you an idea of what each variable is in the data.

As a review, here is what the glimpse looks like:

Rows: 67,852
Columns: 13
$ state             <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
$ agency_name       <chr> "ABBEVILLE POLICE DEPT", "ABBEVILLE POLICE DEPT", "A…
$ nsn               <chr> "1005-01-587-7175", "1240-01-411-1265", "2320-01-371…
$ item_name         <chr> "MOUNT,RIFLE", "SIGHT,REFLEX", "TRUCK,UTILITY", "BAL…
$ quantity          <dbl> 10, 9, 1, 10, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ ui                <chr> "Each", "Each", "Each", "Kit", "Each", "Each", "Each…
$ acquisition_value <dbl> 1626.00, 371.00, 62627.00, 17264.71, 62627.00, 65800…
$ demil_code        <chr> "D", "D", "C", "D", "C", "C", "Q", "D", "C", "D", "D…
$ demil_ic          <dbl> 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ ship_date         <dttm> 2016-09-19, 2016-09-14, 2016-09-29, 2018-01-30, 201…
$ station_type      <chr> "State", "State", "State", "State", "State", "State"…
$ total_value       <dbl> 16260.00, 3339.00, 62627.00, 172647.10, 62627.00, 65…
$ control_type      <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…

6.4 Filter to data of interest

For this analysis we only want to look at “controlled” items, and only those items given to Texas agencies. We’ll build these filters separately so we can save the data at each step.

6.4.1 Get the controlled items

For this analysis we want to focus on “controlled” items vs the more generic non-controlled items we learned about in the documentation. Let’s filter to capture just the controlled data for our analysis.

As you might recall from the Billboard project, the filter() function is our workhorse for focusing our data. In our import notebook we created our control_type column so we could do exactly this: Find only the rows of “controlled” items.

  1. Start a new Markdown section and note you are getting controlled items.

Let’s first count how many of these items are controlled vs. non-controlled. This is much like counting songs by performer: We can use GSA and n() to count the number of rows for control_type, the column we created in the cleaning notebook.

  1. Try to write a quick count of the leso data that uses group_by() on the control_type column and summarizes using n().
Try yourself. The result is below.
leso |> 
  group_by(control_type) |> 
  summarise(number_items = n())

We can see from this result that there are 7918 items that are not controlled, and 59934 that are. When we write our filter, we should end up with 59934 observations, which is only the TRUE ones.

  1. Start with your leso data, but then filter it to control_type == TRUE.
  2. Save the result into a new tibble called leso_c.
Filtering control type
leso_c <- leso |> 
  filter(control_type == TRUE)

To test that this worked you could do the same GSA on control_type we used above, or even glimpse() which shows the number of rows.

If you really wanted to just count the number of rows, there is another function, nrow().

leso_c |> nrow()
[1] 59934

At this point you have a new tibble called leso_c that has only the weapons and other controlled property, so now we can take a closer look at that data.

6.4.2 Filter for Texas

In this analysis we only want to look at Texas data so we’ll make a new object with just that. Remember, you want to do each step and run it to make sure it is working.

  1. Create a new section with headlines and text that denote you are filtering the data to Texas.
  2. Create the code chunk and start your filter process using the leso_c tibble.
  3. Use filter() on the state column to keep all rows with “TX”.
  4. Edit the chunk to put all that into a new object called leso_c_tx.
  5. Print out the tibble.
You know the drill by now
leso_c_tx <- leso_c |> 
  filter(
    state == "TX"
  )

leso_c_tx

How do you know if it worked? Well the first column in the data is the state column, so they should all start with “TX”. We are also down to 4692 rows.

6.5 Building summaries with math

As we get into the first quest, let’s talk about “how” we go about these summaries.

When I am querying my data, I start by envisioning what the result should look like. Let’s take the first question: How many total “controlled” items were transferred to Texas agencies, and what are they all worth?

Let’s glimpse the data …

leso_c_tx |> glimpse()
Rows: 4,692
Columns: 13
$ state             <chr> "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX"…
$ agency_name       <chr> "ABERNATHY POLICE DEPT", "ABERNATHY POLICE DEPT", "A…
$ nsn               <chr> "1240-01-540-3690", "2320-01-371-9584", "2310-01-111…
$ item_name         <chr> "SIGHT,REFLEX", "TRUCK,UTILITY", "TRUCK,AMBULANCE", …
$ quantity          <dbl> 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1…
$ ui                <chr> "Each", "Each", "Each", "Each", "Each", "Each", "Eac…
$ acquisition_value <dbl> 371.00, 62627.00, 96466.00, 120.00, 120.00, 120.00, …
$ demil_code        <chr> "Q", "C", "C", "D", "D", "D", "D", "D", "Q", "C", "C…
$ demil_ic          <dbl> 3, 1, 1, 1, 1, 1, 1, 1, 3, 1, NA, 1, NA, 1, NA, NA, …
$ ship_date         <dttm> 2016-02-02, 2016-03-07, 2020-12-03, 2011-09-13, 201…
$ station_type      <chr> "State", "State", "State", "State", "State", "State"…
$ total_value       <dbl> 1855.00, 62627.00, 96466.00, 120.00, 120.00, 120.00,…
$ control_type      <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…

… and then talk through the logic:

  • “How many total controlled items” is how many individual things were transferred. We have this quantity column that has the number of items in each row, so if we want the total for the data set, we can add together all the values in that column. We do this within a summarize() but instead of counting rows using n(), we’ll use the function sum(quantity) which will add all the values in quantity column together.
  • “… what are they all worth” is very similar, but we want to add together all those values in our total_value column. (Remember, we don’t want to use acquisition_value because that is the value of only ONE item, not the total for the row.

6.5.1 Summarize across the data

So, let’s put this together with code.

  1. Start a new Markdown section that you are getting total values.
  2. Add a code chunk like below and run it.
1leso_c_tx |>
  summarise(
2    sum(quantity),
    sum(total_value)
  )
1
We start with the leso_c_tx tibble of the controlled items then we pipe into summarize().
2
Because we are going to add multiple things, I put them on separate lines just to make this more readable.

You’ll notice that the names of the columns are the function names. We can “name” our new columns just like we did in Billboard. We could call this whatever we want, but good practice is to name it what it is. We’ll use good naming techniques and split the words using _. I also use all lowercase characters.

  1. Edit your chunk to add in the new column names and run it.
leso_c_tx |> 
  summarise(
1    summed_quantity = sum(quantity),
2    summed_total_value = sum(total_value)
  )
1
Note we added summed_quantity = on this line
2
and summed_total_value = on this line.

OK, from this we have learned something: The summed_quantity value there is the number of items all Texas law enforcement agencies have received, and the summed_total_value is how much they were worth.

6.5.2 Data Takeaway: Totals since 2010

  1. Write a Data Takeaway sentence in your notebook summing up the totals in a way that you could drop into a story.

6.5.3 NA values in a sum, mean and median

We’ll do summarize again below with grouping, but first we need to talk about NA values.

When we do math like this within summarize we need to take special note if our column has any blank values, called NA, as in “Not Available”. If there are, then you will get NA for the result. R will NOT do the math on the remaining values unless you tell it so. This is true not only for sum(), but also for mean() which gets an average, and for median() which finds the middle number in a column.

There is a way to get around this by including an argument within the mathy function: sum(col_name, na.rm = TRUE).

I can show this with the demil_ic column which is a number datatype with some NA values. To be clear, the demil_ic variable isn’t really designed to do math on it as it is really a category, but it will show what I’m talking about here.

# You don't have to add this to your notebook.
# I'm just explaining concepts
leso_c_tx |> 
  summarise(
    dumb_sum = sum(demil_ic),
    less_dumb_sum = sum(demil_ic, na.rm = TRUE),
    dumb_avg = mean(demil_ic),
    less_dumb_avg = mean(demil_ic, na.rm = TRUE)
  )

So there you have examples of using sum() and mean() with and without na.rm = TRUE. OK, you’ve been warned.

6.6 Totals by agency

OK, your next question is this: For each agency, how many things did they get and how much was it all worth?

The key part of thinking about this logically is For each agency. That “for each” is a clue that we need group_by() for something. We basically need what we did above, but we first need to group_by the agency_name.

Let’s break this question down:

  • “For each agency” tells me I need to group_by the agency_name so I can summarize totals within each agency.
  • “how many total things” means how many items. Like before, we have the quantity variable, so we need to add all those together within summarize like we did above.
  • “how much was it worth” is another sum, but this time we want to sum the total_value column

So I envision my result looking like this:

agency_name summed_quantity summed_total_value
AFAKE POLICE DEPT 6419 10825707.5
BFAKE SHERIFF’S OFFICE 381 3776291.52
CFAKE SHERIFF’S OFFICE 270 3464741.36
DFAKE POLICE DEPT 1082 3100420.57

The first columns in that summary will be our grouped values. This example is only grouping by one thing, agency_name. The other two columns are the summed values I’m looking to generate.

6.6.1 Group_by, then summary with math

We’ll start with the total_quantity.

  1. Add a new section (headline, text and chunk) that describes the second quest: For each agency in Texas, find the summed quantity and summed total value of the equipment they received.
  2. Add the code below into the chunk and run it.
1leso_c_tx |>
2  group_by(agency_name) |>
  summarize(
3    summed_quantity = sum(quantity)
  )
1
We start with the leso_c_tx, which is the “controlled” data, and then …
2
We group by agency_name. This organizes our data (behind the scenes) so our summarize actions will happen within each agency. Now I normally say run your code one line at a time, but you would not be able to see the groupings at this point, so I usually write group_by() and summarize() together.
3
In summarize() we first name our new column: summed_quantity, then we set that column to equal = the sum of all values in the quantity column. sum() is the function, and we feed it the column we want to add together: quantity. I put this inside of the summarize function in its own line because we will add to it. I enhances readability. RStudio will help you with the indenting, etc.

If you look at the first line of the return, it is taking all the rows for the “ABERNATHY POLICE DEPT” and then adding together all the values in the quantity field.

6.6.2 Add the total_value

We don’t have to stop at one summary. We can perform multiple summarize actions on the same or different columns within the same expression.

Edit your summary chunk to:

  1. Add add a comma after the first summarize action.
  2. Add the new expression to give us the summed_total_value and run it.
leso_c_tx |> 
  group_by(agency_name) |> 
  summarize(
    summed_quantity = sum(quantity),
1    summed_total_value = sum(total_value)
  )
1
Here we add the summed_total_value.

6.6.3 Check the math

If you wanted to test this (and that is a real good idea), you might look at the data from one of the values and check the math. Here are the Abernathy rows. I usually do these tests in a code chunk of their own, and sometimes I delete them after I’m sure my logic works.

leso_c_tx |> 
  filter(agency_name == "ABERNATHY POLICE DEPT") |> 
  select(agency_name, quantity, total_value)

If we look at the quantity column there we can eyeball all the rows and add them on a calculator. Our answer should match our summed_quantity in the summary table.

If we add up the total_value rows then we should end up with the same number as summed_total_value above.

6.6.4 Arrange the results

OK, this gives us our answers, but in alphabetical order. We want to arrange the data so it gives us the most summed_total_value in descending order.

  1. EDIT your block to add an arrange() function below
leso_c_tx |> 
  group_by(agency_name) |> 
  summarize(
    summed_quantity = sum(quantity),
    summed_total_value = sum(total_value)
  ) |> 
1  arrange(summed_total_value |> desc())
1
Adding the pipe and arrange here.

So now we’ve sorted the results to put the highest summed_total_value at the top.

Remember, there are two ways we can set up that arrange() function in descending order:

  • arrange(summed_total_value |> desc())
  • arrange(desc(summed_total_value))

Both work and are correct. It really is your preference.

6.6.5 Consider the results

Is there anything that sticks out in that list? It helps if you know a little bit about Texas cities and counties, but here are some thoughts to ponder:

  • Houston is the largest city in the state (4th largest in the country, in fact). It makes sense that it tops the list. Same for Harris County or even the state police force. Austin being up there is also not crazy, as we have almost a million people.
  • But what about San Marcos (pop. 63,220)? Or Milam County (pop. 24,770)? Those are way smaller cities and law enforcement agencies. They might be worth looking into.

Perhaps we should look at some of the police agencies closest to us.

6.6.6 Data Takeaway: Texas totals

  1. After your last code chunk here, write a Data Takeaway from what you see in this result.

6.7 Looking a local agencies

Our second quest had a second part: How about local police agencies? What was the total quantity and value?

We’ll take the summary above, but then filter it to show only local agencies of interest.

6.7.1 Save our “by agency” list

Since we want to take an existing summary and add more filtering to it, it makes sense to go back into that chunk and save it into a new object so we can reuse it.

  1. EDIT your existing summary chunk to save the result into a new tibble. Name it tx_agency_totals so we are all on the same page.
  2. Add a new line that prints the result to the screen so you can still see it.
1tx_agency_totals <- leso_c_tx |>
  group_by(agency_name) |> 
  summarize(
    summed_quantity = sum(quantity),
    summed_total_value = sum(total_value)
  ) |> 
  arrange(summed_total_value |> desc())

2tx_agency_totals
1
Creating the new object tx_agency_totals to put this summary into.
2
Here we print out the new object, which we can reuse later.

6.7.2 Filtering within a vector

Let’s talk through the filter concepts before you try it with this data.

When we talked about filtering with the Billboard project, we discussed using the | operator as an “OR” function. If we were to apply that logic here, it would look like this:

# Just for explanation. Don't add to notebook.
data |> 
  filter(column_name == "Text to find" | column_name == "More text to find")

That can get pretty unwieldy if you have more than a couple of things to look for.

There is another operator %in% where we can search for multiple items from a list. (This list of items is officially called a “vector”.) Think of it like this in plain English: Filter the column for things in this list.

# Just for explanation. Don't add to notebook.
data |> 
  filter(col_name %in% c("This string", "That string"))

We can take this a step further by saving the items in our list into an R object so we can reuse that list and not have to type out all the terms each time we use them.

# Just for explanation. Don't add to notebook.
list_of_strings <- c(
  "This string",
  "That string"
)

data |> 
  filter(col_name %in% list_of_strings)

This last version is the easiest to read and the most flexible to maintain. We can make adjustments to the list as needed.

6.7.3 Create a vector to build this filter

In the interest of time, I’m going to give you the list of local police agencies to filter on. To be clear, I did considerable work to figure out the exact names of these agencies. I consulted a different data set that lists all law enforcement agencies in Texas and then I used some creative filtering to find their “official” names in the leso data. It also helps that I’m familiar with local cities and counties so I can recognize the names. I don’t want to get sidetracked on that process here so I’ll give you the list and show you how to use it.

  1. Create a new section (headline, text and chunk) and describe you are filtering the summed quantity/values for local agencies.
  2. Add the code below into that chunk.
1local_agencies <- c(
2  "AUSTIN PARKS POLICE DEPT", #NI
  "AUSTIN POLICE DEPT",
  "BASTROP COUNTY SHERIFF'S OFFICE",
  "BASTROP POLICE DEPT",
  "BEE CAVE POLICE DEPT",
  "BUDA POLICE DEPT",
  "CALDWELL COUNTY SHERIFFS OFFICE",
  "CEDAR PARK POLICE DEPT",
  "ELGIN POLICE DEPARTMENT",
  "FLORENCE POLICE DEPT", #NI
  "GEORGETOWN POLICE DEPT",
  "GRANGER POLICE DEPT", #NI
  "HAYS CO CONSTABLE PRECINCT 4",
  "HAYS COUNTY SHERIFFS OFFICE",
  "HUTTO POLICE DEPT",
  "JARRELL POLICE DEPT", #NI
  "JONESTOWN POLICE DEPT", #NI
  "KYLE POLICE DEPT",
  "LAGO VISTA POLICE DEPT",
  "LAKEWAY POLICE DEPT", #NI
  "LEANDER POLICE DEPT",
  "LIBERTY HILL POLICE DEPT", #NI
  "LOCKHART POLICE DEPT",
  "LULING POLICE DEPT",
  "MANOR POLICE DEPT",
  "MARTINDALE POLICE DEPT", #NI
  "PFLUGERVILLE POLICE DEPT",
  "ROLLINGWOOD POLICE DEPT", #NI
  "SAN MARCOS POLICE DEPT",
  "SMITHVILLE POLICE DEPT", #NI
  "SUNSET VALLEY POLICE DEPT", #NI
  "TAYLOR POLICE DEPT", #NI
  "THRALL POLICE DEPT", #NI
  # TEXAS STATE UNIVERSITY HI_ED
  "TRAVIS COUNTY SHERIFFS OFFICE",
  # TRAVIS CONSTABLE OFFICE,
  # SOUTHWESTERN UNIVERSITY HI_ID
  "WESTLAKE HILLS POLICE DEPT", #NI
  "UNIV OF TEXAS SYSTEM POLICE HI_ED",
  "WILLIAMSON COUNTY SHERIFF'S OFFICE"
)

3tx_agency_totals |>
4  filter(agency_name %in% local_agencies)
1
We start by giving our list of agencies a name: local_agencies. This creates an R object that we can reuse. We will need this list a number of times, and it makes sense to manage it in once place instead of each time we need it.
2
Next we fill that object with a list of agency names. Again, I did some pre-work to figure out those names that we aren’t covering here, and in some cases there is a comment #NI next to them. That is a note to myself that the particular agency is NOT INCLUDED in our data, which means I haven’t confirmed the name spelling. It could be at a later date the Austin Parks department gets equipment, but they list their name as “CITY OF AUSTIN PARKS” instead of “AUSTIN PARKS POLICE DEPT” and it would not be filtered properly. This is another example that your most important audience for your code is your future self.
3
Now that our list is created, we can use it to filter our tx_agency_totals data. So, we start with that data, and then …
4
We pipe into filter(), but inside our filter we don’t set it == to a single thing, instead we say: agency_name %in% local_agencies, which says look inside that agency_name column and keep any row that has a value that is also in our local_agencies vector.

6.7.4 Data Takeaway: Local totals

This filters our list of agencies to just those in Central Texas. Do you see anything interesting there?

  1. Write your Data Takeaway about what you’ve found.

6.8 Types of items shipped to each agency

Now that we have an overall idea of what local agencies are doing, let’s dive a little deeper. It’s time to figure out the specific items that they received.

Our question is this: What specific “controlled” items did each agency get and how much were they worth?

In some cases an agency might get the same item shipped to them at different times. For instance, “AUSTIN POLICE DEPT” has multiple rows where they get a single “ILLUMINATOR,INTEGRATED,SMALL ARM” shipped to them on the same date, but then on other dates they have the same item but the quantity is 30. We want all of these “ILLUMINATOR,INTEGRATED,SMALL ARM” for the Austin police added together into a single record.

The logic works like this:

  • Start with the controlled data, and then …
  • Group by the agency_name and item_name, which will group all the rows where those values are the same. All “AUSTIN POLICE DEPT” rows with “ILLUMINATOR,INTEGRATED,SMALL ARM” will be considered together, and then …
  • Summarize to sum the quantity, and then do the same for total_value.

The code for this is very similar to what we did above when we summaries agencies, except we are grouping by two things, the agency_name and the item_name. Let’s do it:

  1. Create a new section (headline, text and code chunk) and describe that you are finding the sums for each item that each agency has received since 2010.
  2. Consult (or even copy) the code you wrote when you created the agency totals, but modify the group_by() to add the item_name, like this: group_by(agency_name, item_name).
  3. Be sure you rename your created R objects, too, perhaps to tx_agency_item_totals.
Give it a go. Below is the result.
1tx_agency_item_totals <- leso_c_tx |>
2  group_by(agency_name, item_name) |>
  summarize(
    summed_quantity = sum(quantity),
    summed_total_value = sum(total_value)
  ) |> 
  arrange(summed_total_value |> desc())

tx_agency_item_totals
1
Be sure to create a new object
2
Here we add the second grouping variable

This reuse of code like this – copying the agency grouping code and editing it to add the item_name value – is very common in coding, and there is nothing wrong with doing so as long as you are careful.

When you reuse code, review it carefully so you don’t override things by accident. In this instance, our original code created an R object: tx_agency_totals <- leso_c_tx |> ... that holds the result of our functions, and we call that later to view it. If we reuse this code and don’t update that object name, we’ll reset the values inside that already-existing object, which was not our intent. We want to create a NEW thing tx_agency_item_totals so we can use that later, too. And if we don’t update the “peek” at the object, we’ll be looking at the old one instead of the new one.

Note

With that last code chunk you might see a warning in your R Console: summarise() has grouped output by ‘agency_name’. You can override using the .groups argument. This is not a problem, it’s just a reminder that when we group by more than one thing, the first grouping is retained when future functions are applied to this result. It’s more confusing than helpful, to be honest. Just know if we wanted to do further manipulation, we might need to use ungroup() first.

6.8.1 Items for local agencies

Just like we did for our agency totals, we want to filter this list of items to those sent to our local agencies. However, this time we’ve already created the list of local agencies so we don’t have to redo that part … we just need to filter by it.

  1. Start a new section (headline, text) and explain that you are looking at items sent to local agencies.
  2. Use filter() to focus the data just on or local_agencies.
tx_agency_item_totals |> 
  filter(agency_name %in% local_agencies)

Because our original list arranged the data by the most expensive items, we can see that here. But it might be easier to rearrange the data by agency name first, then the most expensive items.

  1. EDIT your chunk to add an arrange function.
  2. Within the arrange, set it to agency_name first, then summed_total_value in descending order.
tx_agency_item_totals |> 
  filter(agency_name %in% local_agencies) |> 
  arrange(agency_name, desc(summed_total_value))

6.8.2 Research some interesting items

You’ll want a little more detail about some of these items for your data drop. I realize (and you should, too) that for a “real” story we would need to reach out to sources for more information, but you can search online to learn enough to at least describe some of these items. There are a couple of ways to go about researching items.

  1. Simply search for the items on Google, like this.
  2. Each item has a “National Stock Number,” which is an ID for a government database of supplies. You can search the data for the nsn value and then look up that value online.

Let do an example, looking up “ILLUMINATOR,INTEGRATED,SMALL ARMS”. First we filter the data to find the item. (I’m also using select so we can control the output and see it in this book, but you might not want that in your notebook so you can also check ship_dates, etc.)

leso_c_tx |> 
  filter(
    item_name == "ILLUMINATOR,INTEGRATED,SMALL ARMS",
    agency_name == "AUSTIN POLICE DEPT"
    ) |> 
  select(item_name, nsn)

It looks like most of these illuminators use this nsn: 5855-01-534-5931.

Now we go to the website https://nationalstocknumber.info/ and plug that number into the search bar. We get a couple of returns, and if we click on one we get this page. It gives us a description of the item:

“A device which is a combination of several lasers and white light illumination used to provide multiple capabilities for engaging targets and providing light. The device may contain a flashlight or other white light illumination source, an illuminator, infrared and stand alone aiming lasers/pointers. The device has the capability to mount on an individual weapon.”

Not every item is in the database, but it is worth checking.

6.8.3 Data Takeaway: Local items

Now that you’ve done a bit of research into some of these items, write prose-style data takeaways about items of interest.

6.9 Update index summary

I know there has been a lot of new code here, but learning the code is only half the battle. What have we learned about the data? It’s time to summarize our findinngs.

  1. Open your index.qmd file.
  2. After the text that is here, pick the most interesting data fact you’ve found and write a story lede and source graf.
  3. Follow that with the rest of your data takeaways, written as prose that you could drop into a story.

If you need a reminder of the style, look at the Billboard example

6.10 Turn in your project

  1. Make sure everything runs and Renders properly.
  2. Publish your changes to Quarto Pub and include the link to your project in your index notebook so I can bask in your glory.
  3. Zip your project folder. (Or export to zip if you are using posit.cloud).
  4. Upload to the Canvas assignment.
Important

To be clear, it is your zipped project I am grading. The Quarto Pub link is for convenience.

6.11 What we learned in this chapter

  • We used sum() within a group_by()/summarize() function to add values within a column. Sometimes sum() needs the argument na.rm = TRUE.
  • We used summary() to get descriptive statistics about our data, like the minimum and maximum values, or an average (mean).
  • We learned how to use c() to combine a list of like values into a vector, and then used that vector to filter a column for values %in% that vector.
  • We used nrow() to count the number of rows in an object.