FIT5145: Workshop 3

FIT5145 Week 4 (Workshop 3)

FIT5145: Workshop 3

Before we start

Feedback QR Code
Make sure to fill out the feedback form
Jae
Jesmin
Swathi (5 PM)
Lin (7 PM)
FIT5145: Workshop 3

Data streaming and UNIX commands

FIT5145: Workshop 3

Installing terminal emulator

  • MacOS/Linux: Use the built-in terminal
  • Windows: Install Git Bash as directed (also include unzip during installation)

Let us know if you need any help!

FIT5145: Workshop 3

Basic UNIX shell commands

command function
ls list directory
cd change directory
pwd present working directory
unzip extraction
less stream a file
grep match patterns
wc word count
cat concatenate
awk command line scripting language
FIT5145: Workshop 3

Go through Lab Activity

  • Why do I need to type so much? - Use Tab completion + Arrow keys
FIT5145: Workshop 3

In which chapter did Alice go to a "Mad Tea-Party"?

We use grep here:

$ grep "Alice.*Wonderland" book*.txt
book1.txt:Project Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll
book1.txt:Title: Alice's Adventures in Wonderland
book1.txt:End of Project Gutenberg's Alice's Adventures in Wonderland, by Lewis Carroll
$ grep "Mad Tea-Party" book1.txt
CHAPTER VII. A Mad Tea-Party
FIT5145: Workshop 3

Who are the authors of each book?

Many options, I use a for loop which is used as:

for $variable in {start..end} do command; done
$ for i in {1..10}; do grep "Author:" book${i}.txt; done
Author: Lewis Carroll
Author: James Joyce
Author: Leo Tolstoy
Author: Arthur Conan Doyle
Author: Oscar Wilde
Author: Franz Kafka
Author: Bram Stoker
Author: Charles Dickens
Author: Jane Austen
Author: Rudyard Kipling
FIT5145: Workshop 3

Display the names for books from 1 to 5?

We can use grep with either for loop, or designate [1-5]

$ grep "Author:" book[1-5].txt
book1.txt:Author: Lewis Carroll
book2.txt:Author: James Joyce
book3.txt:Author: Leo Tolstoy
book4.txt:Author: Arthur Conan Doyle
book5.txt:Author: Oscar Wilde
FIT5145: Workshop 3

List the word count of all the books at once?

  • wc, especially with -w to only get word count
  • for loop or lazy search using *
$ for i in {1..10}; do wc -w book${i}.txt; done
 29461 book1.txt
268034 book2.txt
566188 book3.txt
107533 book4.txt
 23731 book5.txt
 25186 book6.txt
164424 book7.txt
138879 book8.txt
124588 book9.txt
 53856 book10.txt
FIT5145: Workshop 3

Comparing R with other tools

  • R might be more suited for data science because ...
  • Python may make more sense if ...
  • CLI tools are great for ...

R is typically better at statistical analysis, modelling and visualisation.

If you plan on integrating general purpose programming with data science.

If you want quick and dirty data examination and manipulation. Also, you don't need a graphical interface i.e. you can work remotely