Appendix E — Bash files

This is an extended lesson on bash that is not always assigned, but is still useful.

E.1 Goal

Let’s learn some more command-line juju for working with files. These commands will help look at code files on your computer.

E.2 Set up

Let’s make sure we are in our class folder. (This is review from Moving Around).

Make sure you are in your icj project folder:

cd ~/Documents/icj

Create a new folder called myproject:

mkdir myproject

Use cd to move inside the myproject folder then use pwd to make sure you are in the right place. The result should be something like this:

$ cd
Users/ccm346/Documents/icj/myproject

E.3 curl

We need some text to work with, so we’re going to pull down some text from Github. I might as well explain what we are doing.

curl is a command to transfer files. I think of it as “Capture URL”. We need to give curl a couple of flags (or options) for this job:

-L stands for “Location”. It allows curl to follow a URL if it is redirected.
-o for “output”. So we can write this to a file (which we are calling data.csv) instead of our terminal window. (We could use -O instead to just use the current file name.)

Warning

This URL below needs to be updated

curl -L -o data.csv https://raw.githubusercontent.com/utdata/icj-class/main/resources/data_example.csv

OK, now we should be able to ls and see our file is there. My output looks like this:

$ ls
data.csv

E.4 head

head allows you to print the top of a file to your screen so you can see what it is. It will default to show you the first 10 lines of a file. When you type this in, hit tab after you type “head da” to let tab completion help you.

head data.csv

will give you this:

$ head data.csv
Quarter,Taxpayer Number,Taxpayer Name,Taxpayer Address,Taxpayer City,Taxpayer State,Taxpayer Zip Code,Taxpayer County,Outlet Number,Location Name,Location Address,Location City,Location State,Location Zip Code,Location County,Location Room Capacity,Location Tot Room Receipts,Location Taxable Receipts
Q1,32051871906,DSN HOSPITALITY LLC,4710 S LAMAR BLVD,AUSTIN,TX,78745,227,00001,DSN HOSPITALITY LLC,3110 STATE HIGHWAY 71 EAST,AUSTIN,TX,78745,011,37,91205.03,90870.01
Q1,32054409241,JEANETTE WELSHE,13801 EVERGREEN WAY,AUSTIN,TX,78737,105,00001,BED AND BREAKFAST,13801 EVERGREEN WAY,AUSTIN,TX,78737,105,4,5417.92,5417.92
Q1,32047098168,AMY MARIE CAPUTO,13601 PAISANO CIR,AUSTIN,TX,78737,105,00001,FLORA PROPERTIES/AMY M. CAPUTO,13601 PAISANO CIR,AUSTIN,TX,78737,105,4,7280.23,7280.23
Q1,32055460730,NATHANIEL R BAUERNFEIND,163 KINLOCH CT,AUSTIN,TX,78737,105,00001,NATHANIEL R BAUERNFEIND,163 KINLOCH CT,AUSTIN,TX,78737,105,1,4735.0,4735.0
Q1,32049290466,SHARON K FOSTER,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,00001,NUTTY BROWN CABIN,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,1,1030.0,1030.0
Q1,32049290466,SHARON K FOSTER,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,00003,NUTTY BROWN MANOR,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,1,2100.0,2100.0
Q1,32049290466,SHARON K FOSTER,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,00004,"ROADRUNNER'S BUNGALOQ",12932 NUTTY BROWN RD,AUSTIN,TX,78737,105,1,0.0,0.0
Q1,32020638758,LESLIE K RENFRO,12803 SHOSHONI TRL,AUSTIN,TX,78737,105,00002,THISTLE HILL STUDIO,12803 SHOSHONI TRL,AUSTIN,TX,78737,105,1,871.0,871.0
Q1,32050668345,"TIPPING T, LLC",4405 MANZANILLO DR,AUSTIN,TX,78749,227,00001,TIPPING T,13127 FITZHUGH RD,AUSTIN,TX,78736,105,1,9427.0,9427.0

It might look like more than 10 lines on your screen because they wrap.

If you want to specify how many lines to display, use the flag -n for number of lines:

head -n 2 data.csv

That will give you two lines, the first which is the header of that file.

E.5 tail

tail shows you the bottom of the file. It takes the same -n flags.

tail data.csv

That result won’t show you the header, because you are looking at the last 10 lines of the file.

E.6 wc

wc I think of as “word count”, but it can also count lines and bytes.

wc data.csv

gives you this output:

$ wc data.csv
     100    1136   15642 data.csv

The first column is number of lines, then the number of words, then bytes.

If you want just the number of lines, us -l.

wc -l data.csv

E.7 cat

cat means to concatenate and print a file to your window. If you feed it two file names, it will give you the first, then the other. Do this:

cat data.csv

This will print the contents of data.csv to your screen. It’s showing all 100 lines.

But what you can do is redirect that output into a file by using >. Do this.

cat data.csv > file01.csv

What you’ve done is take the contents of data.csv, printed the contents and then redirected that content into another file called file01.csv. Since that didn’t exist already, it was created on the fly.

If I wanted to take two files, file01.csv and file02.csv, and then combine them into a single file on your computer, it would look like this. (You don’t have to do these commands, just understand them.):

cat file01.csv file02.csv > combined.csv

Now, combined.csv would be the combination of both files.

E.8 grep

grep is for using regular expressions to find patterns within a file. It takes a regular expression input and the file name and gives in return the lines of the file that match that regular expression.

grep 'ATX INVESTMENTS' data.csv

… will print out just the lines in data.csv that have ‘ATX INVESTMENTS’ somewhere in them. Note that the name in the field is ‘ATX INVESTMENTS LLC’ but it found it with just part of that.

If you want to just know how many lines there are with ‘ATX INVESTMENTS’, use the -c flag for count:

$ grep -c 'ATX INVESTMENTS' data.csv

The answer should be 4.

E.9 Piping commands

You can “pipe” the results of one command into another command with the | character, which you’ll find as the shift of your backslash key. You can string commands together with that, so if I just wanted to see the first lines that has ‘ATX INVESTMENTS’, I can do this:

grep 'ATX INVESTMENTS' data.csv | head -n 1