```{r}
#| label: setup
#| message: false
#| warning: false
library(tidyverse)
library(janitor)
```
6 Military Surplus Analysis
Because of data updates, your answers may differ slightly from what is presented here, especially in videos and gifs.
In the last chapter, we learned about the LESO data … that local law enforcement agencies can get surplus military equipment from the U.S. Department of Defense. We downloaded a combined version of the data, modified it for our purposes (used mutate()
to create a new column calculated from other variables in the data) and then filtered it to a specific time period.
Throughout this lesson I’ll give you a chance to work out the code on your own, but then give you the answer and explain it. You won’t learn this if you just copy/paste. You need more than code that runs; you need to understand what the code is doing.
6.1 Learning goals for this chapter
In this chapter we will start querying the data using summarize with math, basically adding values in a column instead of counting rows, which we did with the Billboard assignment.
Specifically, we’ll will:
- Use the combination of
group_by()
,summarize()
andarrange()
to add columns of data usingsum()
. - Use different
group_by()
groupings in specific ways to get desired results. - Practice using
filter()
on those summaries to better see certain results, including filtering in a vector. - We’ll write about some of the findings, practicing data-centric ledes and sentences describing data.
6.2 Questions to answer
All answers should be based on controlled items given to Texas agencies from Jan. 1, 2010 to present.
- How many total “controlled” items were transferred to Texas agencies, and what are they all worth? We’ll summarize all the controlled items only to get the total quantity and total value of everything.
- How many total “controlled” items did each agency get and how much was it all worth? Which agency got the most stuff?
- How about local police agencies? What was the total quantity and value I’ll give you a list.
- What specific “controlled” items did each agency get and how much were they worth? Now we’re looking at the kinds of items.
- What did local agencies get?
You’ll research some of the more interesting items the agencies received so you can include them in your data drop.
6.3 Set up the analysis notebook
Before we get into how to do this, let’s set up our analysis notebook.
- Make sure you have your military surplus project open in RStudio. If you have your import notebook open, close it and use Run > Restart R and Clear Output.
- Create a new Quarto Document with the title “Texas Analysis”.
- Save the notebook at
02-analysis.qmd
. - Update your
_quarto.yml
file to add the02-analysis.qmd
to your navigation. - Remove the boilerplate text.
- Create a setup section (headline, text and code chunk) that loads the tidyverse library. Include the same execution options we had in the cleaning notebook.
We’ve started each notebook like this, so you should be able to do this on your own now. That said, here is the setup chunk with options:
6.3.1 Load the data into a tibble
- Next create an import section (headline, text and chunk) that loads the data from the previous notebook and save it into a tibble called
leso
. - Add a
glimpse()
of the data for your reference.
We did this in Billboard and you should be able to do it. You’ll use read_rds()
and find your data in your data-processed folder.
Import code
<- read_rds("data-processed/01-leso-all.rds")
leso
|> glimpse() leso
You should see the leso
object in your Environment tab. You also have the glimpse like above so you an idea of what each variable is in the data.
As a review, here is what the glimpse looks like:
Rows: 65,172
Columns: 13
$ state <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
$ agency_name <chr> "ABBEVILLE POLICE DEPT", "ABBEVILLE POLICE DEPT", "A…
$ nsn <chr> "2540-01-565-4700", "1385-01-574-4707", "1005-01-587…
$ item_name <chr> "BALLISTIC BLANKET KIT", "UNMANNED VEHICLE,GROUND", …
$ quantity <dbl> 10, 1, 10, 9, 1, 1, 1, 18, 1, 2, 3, 10, 1, 1, 2, 1, …
$ ui <chr> "Kit", "Each", "Each", "Each", "Each", "Each", "Each…
$ acquisition_value <dbl> 16854.24, 10000.00, 1626.00, 365.00, 62627.00, 65800…
$ demil_code <chr> "D", "Q", "D", "D", "C", "C", "C", "Q", "Q", "Q", "Q…
$ demil_ic <dbl> 1, 3, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3…
$ ship_date <dttm> 2018-01-30 00:00:00, 2017-03-28 00:00:00, 2016-09-1…
$ station_type <chr> "State", "State", "State", "State", "State", "State"…
$ total_value <dbl> 168542.40, 10000.00, 16260.00, 3285.00, 62627.00, 65…
$ control_type <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
6.4 Filter to data of interest
For this analysis we only want to look at “controlled” items, and only those items given to Texas agencies. We’ll build these filters separately so we can save the data at each step.
6.4.1 Get the controlled items
For this analysis we want to focus on “controlled” items vs the more generic non-controlled items we learned about in the documentation. Let’s filter to capture just the controlled data for our analysis.
As you might recall from the Billboard project, the filter()
function is our workhorse for focusing our data. In our import notebook we created our control_type
column so we could do exactly this: Find only the rows of “controlled” items.
- Start a new Markdown section and note you are getting controlled items.
Let’s first count how many of these items are controlled vs. non-controlled. This is much like counting songs by performer: We can use GSA and n()
to count the number of rows for control_type
, the column we created in the cleaning notebook.
- Try to write a quick count of the
leso
data that usesgroup_by()
on thecontrol_type
column and summarizes usingn()
.
Try yourself. The result is below.
|>
leso group_by(control_type) |>
summarise(number_items = n())
We can see from this result that there are 7,257 items that are not controlled, and 57,915 that are. When we write our filter, we should end up with 57,915 observations, which is only the TRUE
ones.
- Start with your
leso
data, but then filter it tocontrol_type == TRUE
. - Save the result into a new tibble called
leso_c
.
Filtering control type
<- leso |>
leso_c filter(control_type == TRUE)
To test that this worked you could do the same GSA on control_type
we used above, or even glimpse()
which shows the number of rows.
If you really wanted to just count the number of rows, there is another function, nrow().
|> nrow() leso_c
[1] 57915
At this point you have a new tibble called leso_c
that has only the weapons and other controlled property, so now we can take a closer look at that data.
6.4.2 Filter for Texas
In this analysis we only want to look at Texas data so we’ll make a new object with just that. Remember, you want to do each step and run it to make sure it is working.
- Create a new section with headlines and text that denote you are filtering the data to Texas.
- Create the code chunk and start your filter process using the
leso_c
tibble. - Use
filter()
on thestate
column to keep all rows with “TX”. - Edit the chunk to put all that into a new object called
leso_c_tx
. - Print out the tibble.
You know the drill by now
<- leso_c |>
leso_c_tx filter(
== "TX"
state
)
leso_c_tx
How do you know if it worked? Well the first column in the data is the state
column, so they should all start with “TX”. We are also down to 4,740 rows.
6.5 Building summaries with math
As we get into the first quest, let’s talk about “how” we go about these summaries.
When I am querying my data, I start by envisioning what the result should look like. Let’s take the first question: How many total “controlled” items were transferred to Texas agencies, and what are they all worth?
Let’s glimpse the data …
|> glimpse() leso_c_tx
Rows: 4,740
Columns: 13
$ state <chr> "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX"…
$ agency_name <chr> "ABERNATHY POLICE DEPT", "ABERNATHY POLICE DEPT", "A…
$ nsn <chr> "2320-01-371-9584", "1240-01-540-3690", "1005-00-856…
$ item_name <chr> "TRUCK,UTILITY", "SIGHT,REFLEX", "RIFLE,5.56 MILLIME…
$ quantity <dbl> 1, 5, 1, 1, 1, 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1…
$ ui <chr> "Each", "Each", "Each", "Each", "Each", "Each", "Eac…
$ acquisition_value <dbl> 62627.00, 365.00, 120.00, 120.00, 96466.00, 120.00, …
$ demil_code <chr> "C", "Q", "D", "D", "C", "D", "D", "D", "C", "D", "Q…
$ demil_ic <dbl> 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, NA, NA, NA, 7, 1, 1…
$ ship_date <dttm> 2016-03-07, 2016-02-02, 2011-09-13, 2011-09-13, 202…
$ station_type <chr> "State", "State", "State", "State", "State", "State"…
$ total_value <dbl> 62627.00, 1825.00, 120.00, 120.00, 96466.00, 120.00,…
$ control_type <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
… and then talk through the logic:
- “How many total controlled items” is how many individual things were transferred. We have this
quantity
column that has the number of items in each row, so if we want the total for the data set, we can add together all the values in that column. We do this within asummarize()
but instead of counting rows usingn()
, we’ll use the functionsum(quantity)
which will add all the values inquantity
column together. - “… what are they all worth” is very similar, but we want to add together all those values in our
total_value
column. (Remember, we don’t want to useacquisition_value
because that is the value of only ONE item, not the total for the row.
6.5.1 Summarize across the data
So, let’s put this together with code.
- Start a new Markdown section that you are getting total values.
- Add a code chunk like below and run it.
- 1
-
We start with the
leso_c_tx
tibble of the controlled items then we pipe intosummarize()
. - 2
- Because we are going to add multiple things, I put them on separate lines just to make this more readable.
You’ll notice that the names of the columns are the function names. We can “name” our new columns just like we did in Billboard. We could call this whatever we want, but good practice is to name it what it is. We’ll use good naming techniques and split the words using _
. I also use all lowercase characters.
- Edit your chunk to add in the new column names and run it.
- 1
-
Note we added
summed_quantity =
on this line - 2
-
and
summed_total_value =
on this line.
OK, from this we have learned something: The summed_quantity
value there is the number of items all Texas law enforcement agencies have received, and the summed_total_value
is how much they were worth.
6.5.2 Data Takeaway: Totals since 2010
- Write a Data Takeaway sentence in your notebook summing up the totals in a way that you could drop into a story.
6.5.3 NA values in a sum, mean and median
We’ll do summarize again below with grouping, but first we need to talk about NA
values.
When we do math like this within summarize we need to take special note if our column has any blank values, called NA
, as in “Not Available”. If there are, then you will get NA
for the result. R will NOT do the math on the remaining values unless you tell it so. This is true not only for sum()
, but also for mean()
which gets an average, and for median()
which finds the middle number in a column.
There is a way to get around this by including an argument within the mathy function: sum(col_name, na.rm = TRUE)
.
I can show this with the demil_ic
column which is a number datatype with some NA values. To be clear, the demil_ic
variable isn’t really designed to do math on it as it is really a category, but it will show what I’m talking about here.
# You don't have to add this to your notebook.
# I'm just explaining concepts
|>
leso_c_tx summarise(
dumb_sum = sum(demil_ic),
less_dumb_sum = sum(demil_ic, na.rm = TRUE),
dumb_avg = mean(demil_ic),
less_dumb_avg = mean(demil_ic, na.rm = TRUE)
)
So there you have examples of using sum()
and mean()
with and without na.rm = TRUE
. OK, you’ve been warned.
6.6 Totals by agency
OK, your next question is this: For each agency, how many things did they get and how much was it all worth?
The key part of thinking about this logically is For each agency. That “for each” is a clue that we need group_by()
for something. We basically need what we did above, but we first need to group_by the agency_name
.
Let’s break this question down:
- “For each agency” tells me I need to group_by the
agency_name
so I can summarize totals within each agency. - “how many total things” means how many items. Like before, we have the
quantity
variable, so we need to add all those together within summarize like we did above. - “how much was it worth” is another sum, but this time we want to sum the
total_value
column
So I envision my result looking like this:
agency_name | summed_quantity | summed_total_value |
---|---|---|
AFAKE POLICE DEPT | 6419 | 10825707.5 |
BFAKE SHERIFF’S OFFICE | 381 | 3776291.52 |
CFAKE SHERIFF’S OFFICE | 270 | 3464741.36 |
DFAKE POLICE DEPT | 1082 | 3100420.57 |
The first column in that summary will be our grouped values. This example is only grouping by one thing, agency_name
. The other two columns are the summed values I’m looking to generate.
6.6.1 Group_by, then summary with math
We’ll start with the total_quantity.
- Add a new section (headline, text and chunk) that describes the second quest: For each agency in Texas, find the summed quantity and summed total value of the equipment they received.
- Add the code below into the chunk and run it.
- 1
-
We start with the
leso_c_tx
, which is the “controlled” data, and then … - 2
-
We group by
agency_name
. This organizes our data (behind the scenes) so our summarize actions will happen within each agency. Now I normally say run your code one line at a time, but you would not be able to see the groupings at this point, so I usually writegroup_by()
andsummarize()
together. - 3
-
In
summarize()
we first name our new column:summed_quantity
, then we set that column to equal=
the sum of all values in thequantity
column.sum()
is the function, and we feed it the column we want to add together:quantity
. I put this inside of the summarize function in its own line because we will add to it. I enhances readability. RStudio will help you with the indenting, etc.
If you look at the first line of the return, it is taking all the rows for the “ABERNATHY POLICE DEPT” and then adding together all the values in the quantity
field.
6.6.2 Add the total_value
We don’t have to stop at one summary. We can perform multiple summarize actions on the same or different columns within the same expression.
Edit your summary chunk to:
- Add add a comma after the first summarize action.
- Add the new expression to give us the
summed_total_value
and run it.
|>
leso_c_tx group_by(agency_name) |>
summarize(
summed_quantity = sum(quantity),
1summed_total_value = sum(total_value)
)
- 1
- Here we add the summed_total_value.
6.6.3 Check the math
If you wanted to test this (and that is a real good idea), you might look at the data from one of the values and check the math. Here are the Abernathy rows. I usually do these tests in a code chunk of their own, and sometimes I delete them after I’m sure my logic works.
|>
leso_c_tx filter(agency_name == "ABERNATHY POLICE DEPT") |>
select(agency_name, quantity, total_value)
If we look at the quantity
column there we can eyeball all the rows and add them on a calculator. Our answer should match our summed_quantity
in the summary table.
If we add up the total_value
rows then we should end up with the same number as summed_total_value
above.
6.6.4 Arrange the results
OK, this gives us our answers, but in alphabetical order. We want to arrange the data so it gives us the most summed_total_value
in descending order.
- EDIT your block to add an
arrange()
function below
|>
leso_c_tx group_by(agency_name) |>
summarize(
summed_quantity = sum(quantity),
summed_total_value = sum(total_value)
|>
) 1arrange(summed_total_value |> desc())
- 1
- Adding the pipe and arrange here.
So now we’ve sorted the results to put the highest summed_total_value
at the top.
Remember, there are two ways we can set up that arrange()
function in descending order:
arrange(summed_total_value |> desc())
arrange(desc(summed_total_value))
Both work and are correct. It really is your preference.
6.6.5 Consider the results
Is there anything that sticks out in that list? It helps if you know a little bit about Texas cities and counties, but here are some thoughts to ponder:
- Houston is the largest city in the state (4th largest in the country, in fact). It makes sense that it tops the list. Same for Harris County or even the state police force. Austin being up there is also not crazy, as we have almost a million people.
- But what about San Marcos (pop. 63,220)? Or Milam County (pop. 24,770)? Those are way smaller cities and law enforcement agencies. They might be worth looking into.
Perhaps we should look at some of the police agencies closest to us.
6.6.6 Data Takeaway: Texas totals
- After your last code chunk here, write a Data Takeaway from what you see in this result.
6.7 Looking a local agencies
Our second quest had a second part: How about local police agencies? What was the total quantity and value?
We’ll take the summary above, but then filter it to show only local agencies of interest.
6.7.1 Save our “by agency” list
Since we want to take an existing summary and add more filtering to it, it makes sense to go back into that chunk and save it into a new object so we can reuse it.
- EDIT your existing summary chunk to save the result into a new tibble. Name it
tx_agency_totals
so we are all on the same page. - Add a new line that prints the result to the screen so you can still see it.
1<- leso_c_tx |>
tx_agency_totals group_by(agency_name) |>
summarize(
summed_quantity = sum(quantity),
summed_total_value = sum(total_value)
|>
) arrange(summed_total_value |> desc())
2 tx_agency_totals
- 1
-
Creating the new object
tx_agency_totals
to put this summary into. - 2
- Here we print out the new object, which we can reuse later.
6.7.2 Filtering within a vector
Let’s talk through the filter concepts before you try it with this data.
When we talked about filtering with the Billboard project, we discussed using the |
operator as an “OR” function. If we were to apply that logic here, it would look like this:
# Just for explanation. Don't add to notebook.
|>
data filter(column_name == "Text to find" | column_name == "More text to find")
That can get pretty unwieldy if you have more than a couple of things to look for.
There is another operator %in%
where we can search for multiple items from a list. (This list of items is officially called a “vector”.) Think of it like this in plain English: Filter the column for things in this list.
# Just for explanation. Don't add to notebook.
|>
data filter(col_name %in% c("This string", "That string"))
We can take this a step further by saving the items in our list into an R object so we can reuse that list and not have to type out all the terms each time we use them.
# Just for explanation. Don't add to notebook.
<- c(
list_of_strings "This string",
"That string"
)
|>
data filter(col_name %in% list_of_strings)
This last version is the easiest to read and the most flexible to maintain. We can make adjustments to the list as needed.
6.7.3 Create a vector to build this filter
In the interest of time, I’m going to give you the list of local police agencies to filter on. To be clear, I did considerable work to figure out the exact names of these agencies. I consulted a different data set that lists all law enforcement agencies in Texas and then I used some creative filtering to find their “official” names in the leso data. It also helps that I’m familiar with local cities and counties so I can recognize the names. I don’t want to get sidetracked on that process here so I’ll give you the list and show you how to use it.
- Create a new section (headline, text and chunk) and describe you are filtering the summed quantity/values for local agencies.
- Add the code below into that chunk.
1<- c(
local_agencies 2"AUSTIN PARKS POLICE DEPT", #NI
"AUSTIN POLICE DEPT",
"BASTROP COUNTY SHERIFF'S OFFICE",
"BASTROP POLICE DEPT",
"BEE CAVE POLICE DEPT",
"BUDA POLICE DEPT",
"CALDWELL COUNTY SHERIFFS OFFICE",
"CEDAR PARK POLICE DEPT",
"ELGIN POLICE DEPARTMENT",
"FLORENCE POLICE DEPT", #NI
"GEORGETOWN POLICE DEPT",
"GRANGER POLICE DEPT", #NI
"HAYS CO CONSTABLE PRECINCT 4",
"HAYS COUNTY SHERIFFS OFFICE",
"HUTTO POLICE DEPT",
"JARRELL POLICE DEPT", #NI
"JONESTOWN POLICE DEPT", #NI
"KYLE POLICE DEPT",
"LAGO VISTA POLICE DEPT",
"LAKEWAY POLICE DEPT",
"LEANDER POLICE DEPT",
"LIBERTY HILL POLICE DEPT", #NI
"LOCKHART POLICE DEPT",
"LULING POLICE DEPT",
"MANOR POLICE DEPT",
"MARTINDALE POLICE DEPT", #NI
"PFLUGERVILLE POLICE DEPT",
"ROLLINGWOOD POLICE DEPT", #NI
"SAN MARCOS POLICE DEPT",
"SMITHVILLE POLICE DEPT", #NI
"SUNSET VALLEY POLICE DEPT", #NI
"TAYLOR POLICE DEPT", #NI
"THRALL POLICE DEPT", #NI
# TEXAS STATE UNIVERSITY HI_ED
"TRAVIS COUNTY SHERIFFS OFFICE",
# TRAVIS CONSTABLE OFFICE,
# SOUTHWESTERN UNIVERSITY HI_ID
"WESTLAKE HILLS POLICE DEPT", #NI
"UNIV OF TEXAS SYSTEM POLICE HI_ED",
"WILLIAMSON COUNTY SHERIFF'S OFFICE"
)
3|>
tx_agency_totals 4filter(agency_name %in% local_agencies)
- 1
-
We start by giving our list of agencies a name:
local_agencies
. This creates an R object that we can reuse. We will need this list a number of times, and it makes sense to manage it in once place instead of each time we need it. - 2
-
Next we fill that object with a list of agency names. Again, I did some pre-work to figure out those names that we aren’t covering here, and in some cases there is a comment
#NI
next to them. That is a note to myself that the particular agency is NOT INCLUDED in our data, which means I haven’t confirmed the name spelling. It could be at a later date the Austin Parks department gets equipment, but they list their name as “CITY OF AUSTIN PARKS POLICE” instead of “AUSTIN PARKS POLICE DEPT” and it would not be filtered properly. This is another example that your most important audience for your code is your future self. - 3
-
Now that our list is created, we can use it to filter our
tx_agency_totals
data. So, we start with that data, and then … - 4
-
We pipe into
filter()
, but inside our filter we don’t set it==
to a single thing, instead we say:agency_name %in% local_agencies
, which says look inside thatagency_name
column and keep any row that has a value that is also in ourlocal_agencies
vector.
6.7.4 Data Takeaway: Local totals
This filters our list of agencies to just those in Central Texas. Do you see anything interesting there?
- Write your Data Takeaway about what you’ve found.
6.8 Types of items shipped to each agency
Now that we have an overall idea of what local agencies are doing, let’s dive a little deeper. It’s time to figure out the specific items that they received.
Our question is this: What specific “controlled” items did each agency get and how much were they worth?
In some cases an agency might get the same item shipped to them at different times. For instance, “AUSTIN POLICE DEPT” has multiple rows where they get a single “ILLUMINATOR,INTEGRATED,SMALL ARM” shipped to them on the same date, but then on other dates they have the same item but the quantity is 30. We want all of these “ILLUMINATOR,INTEGRATED,SMALL ARM” for the Austin police added together into a single record.
The logic works like this:
- Start with the controlled data, and then …
- Group by the
agency_name
anditem_name
, which will group all the rows where those values are the same. All “AUSTIN POLICE DEPT” rows with “ILLUMINATOR,INTEGRATED,SMALL ARM” will be considered together, and then … - Summarize to sum the
quantity
, and then do the same fortotal_value
.
The code for this is very similar to what we did above when we summaries agencies, except we are grouping by two things, the agency_name and the item_name. Let’s do it:
- Create a new section (headline, text and code chunk) and describe that you are finding the sums for each item that each agency has received since 2010.
- Consult (or even copy) the code you wrote when you created the agency totals, but modify the
group_by()
to add theitem_name
, like this:group_by(agency_name, item_name)
. - Be sure you rename your created R objects, too, perhaps to
tx_agency_item_totals
.
Give it a go. Below is the result.
- 1
- Be sure to create a new object
- 2
- Here we add the second grouping variable
This reuse of code like this – copying the agency grouping code and editing it to add the item_name value – is very common in coding, and there is nothing wrong with doing so as long as you are careful.
When you reuse code, review it carefully so you don’t override things by accident. In this instance, our original code created an R object: tx_agency_totals <- leso_c_tx |> ...
that holds the result of our functions, and we call that later to view it. If we reuse this code and don’t update that object name, we’ll reset the values inside that already-existing object, which was not our intent. We want to create a NEW thing tx_agency_item_totals
so we can use that later, too. And if we don’t update the “peek” at the object, we’ll be looking at the old one instead of the new one.
With that last code chunk you might see a warning in your R Console: summarise()
has grouped output by ‘agency_name’. You can override using the .groups
argument. This is not a problem, it’s just a reminder that when we group by more than one thing, the first grouping is retained when future functions are applied to this result. It’s more confusing than helpful, to be honest. Just know if we wanted to do further manipulation, we might need to use ungroup()
first.
6.8.1 Items for local agencies
Just like we did for our agency totals, we want to filter this list of items to those sent to our local agencies. However, this time we’ve already created the list of local agencies so we don’t have to redo that part … we just need to filter by it.
- Start a new section (headline, text) and explain that you are looking at items sent to local agencies.
- Use
filter()
to focus the data just on orlocal_agencies
.
|>
tx_agency_item_totals filter(agency_name %in% local_agencies)
Because our original list arranged the data by the most expensive items, we can see that here. But it might be easier to rearrange the data by agency name first, then the most expensive items.
- EDIT your chunk to add an arrange function.
- Within the arrange, set it to
agency_name
first, thensummed_total_value
in descending order.
|>
tx_agency_item_totals filter(agency_name %in% local_agencies) |>
arrange(agency_name, desc(summed_total_value))
6.8.2 Research some interesting items
You’ll want a little more detail about some of these items for your data drop. I realize (and you should, too) that for a “real” story we would need to reach out to sources for more information, but you can search online to learn enough to at least describe some of these items. There are a couple of ways to go about researching items.
- Simply search for the items on Google, like this.
- Each item has a “National Stock Number,” which is an ID for a government database of supplies. You can search the data for the
nsn
value and then look up that value online.
Let do an example, looking up “ILLUMINATOR,INTEGRATED,SMALL ARMS”. First we filter the data to find the item. (I’m also using select so we can control the output and see it in this book, but you might not want that in your notebook so you can also check ship_dates
, etc.)
|>
leso_c_tx filter(
== "ILLUMINATOR,INTEGRATED,SMALL ARMS",
item_name == "AUSTIN POLICE DEPT"
agency_name |>
) select(item_name, nsn)
It looks like most of these illuminators use this nsn
: 5855-01-534-5931.
Now we go to the website https://nationalstocknumber.info/ and plug that number into the search bar. We get a couple of returns, and if we click on one we get this page. It gives us a description of the item:
“A device which is a combination of several lasers and white light illumination used to provide multiple capabilities for engaging targets and providing light. The device may contain a flashlight or other white light illumination source, an illuminator, infrared and stand alone aiming lasers/pointers. The device has the capability to mount on an individual weapon.”
Not every item is in the database, but it is worth checking.
6.8.3 Data Takeaway: Local items
Now that you’ve done a bit of research into some of these items, write prose-style data takeaways about items of interest.
6.9 Update index summary
I know there has been a lot of new code here, but learning the code is only half the battle. What have we learned about the data? It’s time to summarize our findings.
- Open your
index.qmd
file. - After the text that is here, pick the most interesting data fact you’ve found and write a story lede and source graf.
- Follow that with the rest of your data takeaways, written as prose that you could drop into a story.
If you need a reminder of the style, look at the Billboard example
6.10 Turn in your project
- Make sure everything runs and Renders properly.
- Publish your changes to Quarto Pub and include the link to your project in your index notebook so I can bask in your glory.
- Zip your project folder. (Or export to zip if you are using posit.cloud).
- Upload to the Canvas assignment.
To be clear, it is your zipped project I am grading. The Quarto Pub link is for convenience.
6.11 What we learned in this chapter
- We used
sum()
within agroup_by()
/summarize()
function to add values within a column. Sometimessum()
needs the argumentna.rm = TRUE
. - We learned how to use
c()
to combine a list of like values into a vector, and then used that vector to filter a column for values%in%
that vector. - We used
nrow()
to count the number of rows in an object.