Appendix E — Bash files
This is an extended lesson on bash that is not always assigned, but is still useful.
E.1 Goal
Let’s learn some more command-line juju for working with files. These commands will help look at code files on your computer.
E.2 Set up
Let’s make sure we are in our class folder. (This is review from Moving Around).
- Make sure you are in your icj project folder:
cd ~/Documents/icj
- Create a new folder called
myproject
:
mkdir myproject
- Use
cd
to move inside the myproject folder then usepwd
to make sure you are in the right place. The result should be something like this:
$ cd
Users/ccm346/Documents/icj/myproject
E.3 curl
We need some text to work with, so we’re going to pull down some text from Github. I might as well explain what we are doing.
curl is a command to transfer files. I think of it as “Capture URL”. We need to give curl
a couple of flags (or options) for this job:
- -L stands for “Location”. It allows
curl
to follow a URL if it is redirected. -o
for “output”. So we can write this to a file (which we are callingdata.csv
) instead of our terminal window. (We could use-O
instead to just use the current file name.)
This URL below needs to be updated
curl -L -o data.csv https://raw.githubusercontent.com/utdata/icj-class/main/resources/data_example.csv
OK, now we should be able to ls
and see our file is there. My output looks like this:
$ ls
data.csv
E.4 head
head allows you to print the top of a file to your screen so you can see what it is. It will default to show you the first 10 lines of a file. When you type this in, hit tab after you type “head da” to let tab completion help you.
head data.csv
will give you this:
$ head data.csv
Quarter,Taxpayer Number,Taxpayer Name,Taxpayer Address,Taxpayer City,Taxpayer State,Taxpayer Zip Code,Taxpayer County,Outlet Number,Location Name,Location Address,Location City,Location State,Location Zip Code,Location County,Location Room Capacity,Location Tot Room Receipts,Location Taxable Receipts
Q1,32051871906,DSN HOSPITALITY LLC,4710 S LAMAR BLVD,AUSTIN,TX,78745,227,00001,DSN HOSPITALITY LLC,3110 STATE HIGHWAY 71 EAST,AUSTIN,TX,78745,011,37,91205.03,90870.01
Q1,32054409241,JEANETTE WELSHE,13801 EVERGREEN WAY,AUSTIN,TX,78737,105,00001,BED AND BREAKFAST,13801 EVERGREEN WAY,AUSTIN,TX,78737,105,4,5417.92,5417.92
Q1,32047098168,AMY MARIE CAPUTO,13601 PAISANO CIR,AUSTIN,TX,78737,105,00001,FLORA PROPERTIES/AMY M. CAPUTO,13601 PAISANO CIR,AUSTIN,TX,78737,105,4,7280.23,7280.23
Q1,32055460730,NATHANIEL R BAUERNFEIND,163 KINLOCH CT,AUSTIN,TX,78737,105,00001,NATHANIEL R BAUERNFEIND,163 KINLOCH CT,AUSTIN,TX,78737,105,1,4735.0,4735.0
Q1,32049290466,SHARON K FOSTER,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,00001,NUTTY BROWN CABIN,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,1,1030.0,1030.0
Q1,32049290466,SHARON K FOSTER,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,00003,NUTTY BROWN MANOR,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,1,2100.0,2100.0
Q1,32049290466,SHARON K FOSTER,12932 NUTTY BROWN RD APT C,AUSTIN,TX,78737,105,00004,"ROADRUNNER'S BUNGALOQ",12932 NUTTY BROWN RD,AUSTIN,TX,78737,105,1,0.0,0.0
Q1,32020638758,LESLIE K RENFRO,12803 SHOSHONI TRL,AUSTIN,TX,78737,105,00002,THISTLE HILL STUDIO,12803 SHOSHONI TRL,AUSTIN,TX,78737,105,1,871.0,871.0
Q1,32050668345,"TIPPING T, LLC",4405 MANZANILLO DR,AUSTIN,TX,78749,227,00001,TIPPING T,13127 FITZHUGH RD,AUSTIN,TX,78736,105,1,9427.0,9427.0
It might look like more than 10 lines on your screen because they wrap.
If you want to specify how many lines to display, use the flag -n for number of lines:
head -n 2 data.csv
That will give you two lines, the first which is the header of that file.
E.5 tail
tail shows you the bottom of the file. It takes the same -n flags.
tail data.csv
That result won’t show you the header, because you are looking at the last 10 lines of the file.
E.6 wc
wc I think of as “word count”, but it can also count lines and bytes.
wc data.csv
gives you this output:
$ wc data.csv
100 1136 15642 data.csv
The first column is number of lines, then the number of words, then bytes.
If you want just the number of lines, us -l.
wc -l data.csv
E.7 cat
cat means to concatenate and print a file to your window. If you feed it two file names, it will give you the first, then the other. Do this:
cat data.csv
This will print the contents of data.csv
to your screen. It’s showing all 100 lines.
But what you can do is redirect that output into a file by using >
. Do this.
cat data.csv > file01.csv
What you’ve done is take the contents of data.csv, printed the contents and then redirected that content into another file called file01.csv. Since that didn’t exist already, it was created on the fly.
If I wanted to take two files, file01.csv
and file02.csv
, and then combine them into a single file on your computer, it would look like this. (You don’t have to do these commands, just understand them.):
cat file01.csv file02.csv > combined.csv
Now, combined.csv
would be the combination of both files.
E.8 grep
grep is for using regular expressions to find patterns within a file. It takes a regular expression input and the file name and gives in return the lines of the file that match that regular expression.
grep 'ATX INVESTMENTS' data.csv
… will print out just the lines in data.csv
that have ‘ATX INVESTMENTS’ somewhere in them. Note that the name in the field is ‘ATX INVESTMENTS LLC’ but it found it with just part of that.
If you want to just know how many lines there are with ‘ATX INVESTMENTS’, use the -c flag for count:
$ grep -c 'ATX INVESTMENTS' data.csv
The answer should be 4.
E.9 Piping commands
You can “pipe” the results of one command into another command with the |
character, which you’ll find as the shift of your backslash key. You can string commands together with that, so if I just wanted to see the first lines that has ‘ATX INVESTMENTS’, I can do this:
grep 'ATX INVESTMENTS' data.csv | head -n 1