Shell- user interface for running commands/Scripting language to automate tasks.
-Default one is Bash
-Others include:sh,ksh,tcsh,zsh and fish.
~ means home directory.
copying files ND SAVING TEM IN BACKUP
cp seasonal/summer.csv backup/summer.bck
moving a file to backup
mv seasonal/spring.csv seasonal/summer.csv backup
Renaming Files
mv winter.csv winter.csv.bck
Deleting files
rm seasonal/summer.csv
Creating and removing directory
rmdir-removes current directory
mkdir creates a new directory
Viewing content of files like csv use spacebar or n or p for previous and q for quiting
less seasonal/spring.csv seasonal/summer.csv
Head of the dataset by use of n to specify lines
head -n 5 seasonal/winter.csv
Viewing files even if they are nested in a directory
ls -R -F
Soring a command output into a file
example
tail -n 5 seasonal/winter.csv >last.csv
SELECTING COLUMNS
which means "select columns 2 through 5 and columns 8, using comma as the separator". cut uses -f (meaning "fields") to specify columns and -d (meaning "delimiter") to specify the separator. You need to specify the latter because some files may use spaces, tabs, or colons to separate columns.
cut -f 2-5 ,8 -d ,chris.csv
For checking default shell on your command line type:
If you don't have bash simply type 'bash' on the CLI(command line)
-Running batch jobs (ETL)
-Getting information
-printing files and strings
-compression and archiving
-performing network operations
ls | sort -r
man ls | head -20
Filters -transforms input data into output data
-wc,cat,more,head,sort,grep and also can be chained together.
Pipe - denoted by |
-used for chaining filter commands
-```command1|command2```
-output of the first cmd is input of the second command
-basically pipe means pipeline
``` ls | sort -r represnts reverse sorting```
Variables-scope limited to shell
-use set to list all shell variables
-```set | head -4```
-assigning variables with = not having spaces var=value
``` name='chris'
#printing we use
echo $name```
-deleting variables
``` unset name```
Environment Variables -have extended scope
``` export variablenme ```
-listing all env variables use ```env```
``` env | grep variablename```
-list of commands interpreted by a scripting lng.
-scripts used for automating processes
-shell script is an executable text file with an interpreter directive
shebang
directive:
its usually represented as:
#!interpreter [optional argument]
Interpreter- is the absolute path to an executable program
optional-argument-single argument string
Batch mode:
commands run squentially
first runs then second
command1;command2
Concurrent mode:
commands run in parallel
Command works works on background and pass input to command2 in foreground.
command1 & command2
head and tail select rows, cut selects columns, and grep selects lines according to what they contain.
In its simplest form, grep takes a piece of text followed by one or more filenames and prints all of the lines in those files that contain that text.
For example, grep bicuspid seasonal/winter.csv prints lines from winter.csv that contain "bicuspid".
grep can search for patterns as well; we will explore those in the next course. What's more important right now is some of grep's more common flags:
-c: print a count of matching lines rather than the lines themselves
-h: do not print the names of files when searching multiple files
-i: ignore case (e.g., treat "Regression" and "regression" as matches)
-l: print the names of files that contain matches, not the matches
-n: print line numbers for matching lines
-v: invert the match, i.e., only show lines that don't match
Data Manipulation Pipeline
cut -d , -f 2 seasonal/summer.csv | grep -v Tooth | head -n 1
As its name suggests, sort puts data in order.
By default it does this in ascending alphabetical order, but the flags -n and -r can be used to sort numerically and reverse the order of its output, while -b tells it to ignore leading blanks and -f tells it to fold case (i.e., be case-insensitive).
Pipelines often use grep to get rid of unwanted records and then sort to put the remaining records in order.
cut -d , -f 2 seasonal/winter.csv | grep -v Tooth |sort -r
We use Uniq to remove adjacent duplicates. -c is for count
cut -f 2 -d , seasonal/winter.csv | grep -v Tooth | sort |uniq -c
4 bicuspid
7 canine
6 incisor
4 molar
4 wisdom
# Looping
for filetype in docx odt pdf; do echo $filetype; done
$ files=seasonal/*.csv$ for f in $files; do echo $f; done
seasonal/autumn.csvseasonal/spring.csv
seasonal/summer.csv
seasonal/winter.csv