# update 57 to your actual age
<- 57
age - 7) * 2 (age
[1] 100
Before we dive into RStudio and programming and all that, I want to show you where we are heading, so you can “visualize success”. As I wrote in the book intro, I’m a believer in Scripted Journalism … data journalism that is repeatable, transparent and annotated. As such, the whole purpose of this book is to train you to create documents to share your work.
The best way to explain this is to show you an example.
This is a website with all the code from a data journalism project. If you click on the navigation link for Cleaning you can read where the data come from and see all the steps I went through – with code and explanation – to process the data so I could work with it. And in the Analysis 2023 notebook you’ll see I set out with some questions for the data, and then I wrote the code to find my answers. Along with the way I wrote explanations of how and why I did what I did.
This website was created using Quarto and R, and the tool I used to write it was the RStudio IDE. Here’s the crazy part: I didn’t have to write any HTML code, I just wrote my thoughts in text and my code in R. With a few short lines of configuration and a two-word command quarto publish
I was able to publish my work on the Internet for free.
Keep this in mind:
Creating shareable work is our goal for every project. Let’s get started.
If you are using the online posit.cloud version of RStudio, then some steps have to be done differently, mainly in dealing with project creation and exporting. I’ll try to note when I’m aware of differences, but may not go into great detail in the book. Happy to do so in class.
In this case, you need go to your posit.cloud account and then use the blue New Project button to launch a new RStudio project, and then continue below.
When you launch RStudio, you’ll get a screen that looks like this:
There are some preferences in RStudio that I would like you to change. By default, the program wants to save the state of your work (all the variables and such) when you close a project, but that is typically not good practice. We’ll change that.
Next we will set some value is the Code pane.
We’ll get into why we did this part later.
R is an open-source language, which means that other programmers can contribute to how it works. It is what makes R beautiful.
What happens is developers will find it difficult to do a certain task, so they will write code that solves that problem and save it into an R “package” so they can use it later. They share that code with the community, and suddenly the R garage has an “ultimate set of tools” that would make Spicoli’s dad proud.
One set of these tools is the tidyverse developed by Hadley Wickham and his team at Posit. It’s a set of R packages for data science that are designed to work together in similar ways. Prof. Lukito and I are worshipers of the tidyverse worldview and we’ll use these tools extensively. While not required reading, I highly recommend Wickham’s book R for data science, which is free.
There are also a series of useful tidyverse cheatsheets that can help you as you use the packages and functions from the tidyverse. We’ll refer to these throughout the course.
We will use these tidyverse packages extensively throughout the course. We’ll install some other packages later when we need them.
There are two steps to using an R package:
install.packages("package_name")
. You only have to do this once for each computer, so I usually do it using the R Console instead of in a notebook.library(package_name)
. This has to be done for each notebook or script that uses it, so it is usually one of the first things you’ll see in a notebook.You use “quotes” around the package name when you are installing, but you DON’T need quotes when you load the library.
We need to install some packages before we can go further. To do this, we will use the Console, which we haven’t talked about much yet.
install.packages(c("quarto", "rmarkdown", "tidyverse", "janitor"))
You’ll see a bunch of response fly by in the Console. It’s probably all fine unless it ends the last response with an error.
If you are using the RStudio IDE app on your computer, you only have to do this install.packages()
move once. However, if you are using the online posit.cloud version of RStudio, you’ll have to do this for each new project because each project is a new virtual computer. I’ve included a posit.cloud cheetsheet here
OK, we’re done with all the computer setup. Let’s get to work.
This is a case where posit.cloud differs greatly. See the posit.cloud Appendix to see how to build a new Quarto Website project.
When we work in RStudio, we will create “Projects” to hold all the files related to one another. This sets the “working directory”, which is a sort of home base for the project.
+R
hexagon sign.rwd
folder we created earlier.I want you to be anal retentive about naming your folders. It’s a good programming habit.
When you hit Create Project, your RStudio window will refresh and two things will happen:
Let’s walk through the files created and explain what they are in order of importance.
_quarto.yml
file is a YAML configuration file for your project. This allows us to set publishing rules for our Quarto project. We might not get too much into all the options, but you can read more about it in the Quarto Guide if you like.index.qmd
file is a Quarto document that makes the “home page” of your website. In this book you’ll use the index to describe your project and record other important information related to it. Each Quarto file you create can become a new page in your website, and they all end in .qmd
, which stands for Quarto Markdown. (We might also encounter and edit .Rmd
files, which are very similar, but just a little less awesome.) The about.qmd
file is another Quarto document created as a placeholder. We’ll usually rename/reuse that or delete it.christian-first-project.Rproj
file is your Project file. It sets the working directory or what is essentially “home base” for your project. When you go to open your project again, this is the file you will open.styles.css
is file where you can assert extra control on your website. We won’t use it.The big thing to remember is this: Most of our work is done in files with the .qmd
suffix.
The document that opened on the left is our Quarto document where we will do our work. The Quarto document is a cutting-edge way of authoring programming documents where the result can be output in a myriad of formats: As HTML, PDFs, slideshows, books, etc. In fact, this very book you are reading is written in R using Quarto.
I could write reams of text about the birth of Quarto and the evolution of what came before it, but you won’t really care. Let’s break this down to what you need to know:
Think of Render like exporting or publishing a pretty version of your work.
Let’s see what this basic document looks like when when we Render it.
What this should do is open up the Console below the document and you’ll see a bunch of feedback, but on the right side of RStudio. You should end up with the Viewer pane that shows your document. Here is a tour of sorts:
For the posit.cloud users, this will launch in a new web browser window. RStudio users can do that if you want though a button in the Viewer toolbar.
The top of our document has what we call the metadata of our document written in YAML. These are commands to control the output of our document. Right now it only has the title of our document.
---
title: "christian-first-project" ---
When we created our project, RStudio added the name of our folder here, which kinda sucks because that is not what the title is for. It should describe a title for your project like the top of a Microsoft Word or Google Docs document.
---
title: "Christian's First Quarto project" ---
The next bit of our document shows an R chunk:
This is where we write and execute our programming code. Let’s break down parts of this:
{r}
part notes that this is R code. (Quarto supports other languages like {python}
, but we’ll stick with R.)1 + 1
part is the code. In this case, it is some basic math. We do cooler stuff later.Let’s run this chunk of code to see what happens.
Doing this executes the code that is in the R block and it will print the result to your document below the chunk. You might have noticed something similar when you rendered your document earlier.
There are about five keyboard commands that I will implore you to learn. Here are the first three. Remember if you are on a PC use Cntl instead of Cmd.
Remember the goal of all our work here in this class is to explain to use data analysis to make sense of the world and then explain to others what we are doing. There are multiple parts to that:
Our goal with this notebook is to discover what is the socially-acceptable age to date someone older or younger than ourselves.
## My upper dating age
The following section details the [socially-acceptable maximum age of anyone you should date](https://www.psychologytoday.com/us/blog/meet-catch-and-keep/201405/who-is-too-young-or-too-old-you-date).
The math works like this:
- Take your age
- subtract 7 - Double the result
Let’s walk through this Markdown code. You might bookmark the Markdown basics so you can refer back to it as you learn.
##
line is a “Header 2” headline, meaning it is the second biggest. (The title is an H1.) Add more hashmarks ###
and you get a smaller headline, like subheads, etc.[words to link](https://the_url.org)
.-
at the beginning of a line creates a bullet list. (You can also use *
). Those lines need to be one after another without blank lines.I should note at this point there is a “Visual” editor where RStudio gives you more formatted look as your editing as it writes Markdown underneath the hood. I want you to use the “Source” editor so you can see and learn the underlying Markdown syntax. All my examples will be written in “Source” mode. After you pass this class, you can use “Visual” mode.
Let’s add a new R chunk to add the code that will calculate the maximum age of someone we should date.
# update 57 to your actual age
<- 57
age - 7) * 2 (age
[1] 100
Congratulations! The answer given at the bottom of that code chunk is the upper end age of someone socially acceptable for you to date.
Throwing aside whether the formula is sound, let’s break down the code.
# update 57 to your age
is a comment. It’s a way to explain what is happening in the code without being considered part of the code. We create comments inside code chunks by starting with #
. You can also add a comment at the end of a line. Comments will appear greyed out and italicized in your code chunks if formatted correctly.age <- 57
is assigning a number (57
) to an R object/variable called (age
). A variable is a placeholder. It can hold numbers, text or even groups of numbers. Variables are key to programming because they allow you to change a value as you go along.(age - 7) * 2
takes the value of age
and subtracts 7
, then multiplies by 2
.[1] 100
in my case. That means there was one observation [1], and the value was “100”. For the record, my wife is much younger than that. Perhaps this formula breaks down when you get older ¯\_(ツ)_/¯
.Now you can play with the number assigned to the age variable to test out different ages. Do that.
Now, I want you to add a similar section that calculates the minimum age of someone you should date, but using the formula (age / 2) + 7
.
age
variable already established.(age / 2) + 7
to the chunk.Now you know the youngest a person should be that you date. FWIW, we don’t recreate the assignment of the age
variable since we already have one. A Quarto document is designed to run from top to bottom so that all the pieces work together.
One last thing to point out in the document window: The toolbar that runs across the top of the document window. The image below explains some of the more useful tools, but you REALLY should learn and use the keyboard commands instead.
We’ve been concentrating on editing the Quarto document, but let’s peek back at the Files pane to note some new files that were created.
Note there is a new folder there, _site
. All your rendered versions to into the folder to make it easy to share or publish them.
With our next project, we’ll create some other folders to store our data and such.
Let’s publish our work to the Internet!!!!
quarto publish
and hit Return on your keyboard.A bunch of stuff will happen in your Terminal pane. Here is a video showing this:
A couple of things about this the video clip above:
You can learn more about Quarto pub here, including how to configure your profile page.
Try some things on your own! Go into the about.qmd
page and write some things about yourself, like your favorite hobbies. Use the Markdown guide to write headlines, lists and maybe even try an image? (You can use a URL from an image on the web.)
quarto publish
command.This is a bit different for posit.cloud users. See below.
The best way to turn in all of those files into Canvas is to compress the project folder into a single .zip
file that you can upload to the assignment.
Documents/rwd
folder.yourname-final-project
folder.
.zip
file to the assignment for this week in Canvas.If you find you make changes to your R files after you’ve zipped your folder, you’ll need to to zip
it again. Make sure you get the new version (or delete the old one first).
Because we are building “repeatable” code, I’ll be able to download your .zip
files, uncompress them, and the re-run them to get the same results.
Well done!
You can export your project as a zip file from posit.cloud pretty easily. Just follow the directions on this screenshot:
Then submit the downloaded .zip
file to Canvas.
install.packages()
to download R packages to your computer. Typically executed from within the Console and only once per computer. We installed a lot of packages including the tidyverse.The Terminal is where you can send commands to your computer using text vs. pointing and clicking through “normal” actions. It’s super powerful and useful, but this is probably the only terminal command we’ll use.↩︎