Linux Commands basic to advance Usage Part-3 (sed and awk)

SED COMMANDS

Sed Basics - Find and Replace Using RegEx

This hack explains how to use sed substitute command “s”.

The `s’ command is probably the most important in `sed’ and has a lot of different options.

The `s’ command attempts to match the pattern space against the supplied REGEXP; if the match is successful, then that portion of the pattern space which was matched is replaced with REPLACEMENT.

Syntax:

#sed 'ADDRESSs/REGEXP/REPLACEMENT/FLAGS' filename

#sed 'PATTERNs/REGEXP/REPLACEMENT/FLAGS' filename

• s is substitute command

• / is a delimiter

• REGEXP is regular expression to match

• REPLACEMENT is a value to replace

FLAGS can be any of the following :

• g Replace all the instance of REGEXP with REPLACEMENT

• n Could be any number,replace nth instance of the REGEXP with REPLACEMENT.

• p If substitution was made, then prints the new pattern space.

• i match REGEXP in a case-insensitive manner.

• w file If substitution was made, write out the result to the given file.

• We can use different delimiters ( one of @ % ; : ) instead of /

Let us first create thegeekstuff.txt file that will be used in all the examples mentioned below.

$ cat thegeekstuff.txt

# Instruction Guides

1. Linux Sysadmin, Linux Scripting etc.

2. Databases - Oracle, mySQL etc.

3. Security (Firewall, Network, Online Security etc)

4. Storage in Linux

5. Productivity (Too many technologies to explore, not

much time available)

# Additional FAQS

6. Windows- Sysadmin, reboot etc.

Substitute Word “Linux” to “Linux-Unix” Using sed s//

In the example below, in the output line “1. Linux-Unix Sysadmin, Linux Scripting etc” only first Linux is replaced by Linux-Unix. If no flags are specified the first match of line is replaced.

$ sed 's/Linux/Linux-Unix/' thegeekstuff.txt

# Instruction Guides

1. Linux-Unix Sysadmin, Linux Scripting etc.

2. Databases - Oracle, mySQL etc.

3. Security (Firewall, Network, Online Security etc)

4. Storage in Linux-Unix

5. Productivity (Too many technologies to explore, not

much time available)

# Additional FAQS

6. Windows- Sysadmin, reboot etc.

Substitute all Appearances of a Word Using sed s//g

The below sed command replaces all occurrences of Linux to Linux-Unix using global substitution flag “g”.

$ sed 's/Linux/Linux-Unix/g' thegeekstuff.txt

# Instruction Guides

1. Linux-Unix Sysadmin, Linux-Unix Scripting etc.

2. Databases - Oracle, mySQL etc.

3. Security (Firewall, Network, Online Security etc)

4. Storage in Linux-Unix

5. Productivity (Too many technologies to explore, not

much time available)

# Additional FAQS

6. Windows- Sysadmin, reboot etc.

Substitute Only 2nd Occurrence of a Word Using sed s//2

In the example below, in the output line “1. Linux Sysadmin, Linux-Unix Scripting etc.” only 2nd occurrence of Linux is replaced by Linux-Unix.

$ sed 's/Linux/Linux-Unix/2' thegeekstuff.txt

# Instruction Guides

1. Linux Sysadmin, Linux-Unix Scripting etc.

2. Databases - Oracle, mySQL etc.

3. Security (Firewall, Network, Online Security etc)

4. Storage in Linux

5. Productivity (Too many technologies to explore, not much time available)

# Additional FAQS

6. Windows- Sysadmin, reboot etc.

4. Write Changes to a File and Print the Changes Using sed s//gpw

The example below has substitution with three flags. It substitutes all the occurrence of Linux to Linux-Unix and prints the substituted output as well as written the same to the given the file.

$ sed -n 's/Linux/Linux-Unix/gpw output' thegeekstuff.txt

1. Linux-Unix Sysadmin, Linux-Unix Scripting etc.

4. Storage in Linux-Unix

$ cat output

1. Linux-Unix Sysadmin, Linux-Unix Scripting etc.

4. Storage in Linux-Unix

5. Substitute Only When the Line Matches with the Pattern using sed

Eliminate HTML Tags from file Using sed

In this example, the regular expression given in the sed command matches the html tags and replaces with the empty.

$ sed -e 's/<[^>]*>//g'

This <b> is </b> an <i>example</i>. This is an example.

Awk Introduction

This hack explains the fundamental awk working methodology along with 7 practical awk print examples.

Awk Introduction and Printing Operations

Awk is a programming language which allows easy manipulation of structured data and the generation of formatted reports. Awk stands for the names of its authors “Aho, Weinberger, and Kernighan”

The Awk is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that matches with the specified patterns and then perform associated actions.

Some of the key features of Awk are:

• Awk views a text file as records and fields.

• Like common programming language, Awk has variables, conditionals and loops

• Awk has arithmetic and string operators.

• Awk can generate formatted reports

• Awk reads from a file or from its standard input, and outputs to

its standard output. Awk does not get along with non-text files.

Syntax:

awk '/search pattern1/ {Actions}

/search pattern2/ {Actions}' file

Awk Working Methodology

1. Awk reads the input files one line at a time.

2. For each line, it matches with given pattern in the given order, if matches performs the corresponding action.

3. If no pattern matches, no action will be performed.

4. In the above syntax, either search pattern or action are optional, But not both.

5. If the search pattern is not given, then Awk performs the given actions for each line of the input.

6. If the action is not given, print all that lines that matches with the given patterns which is the default action.

7. Empty braces with out any action does nothing. It wont perform default printing operation.

8. Each statement in Actions should be delimited by semicolon.

Let us create employee.txt file which has the following content, which will be used in the examples mentioned below.

$ cat employee.txt

100 Thomas Manager Sales $5,000

200 Jason Developer Technology $5,500

300 Sanjay Sysadmin Technology $7,000

400 Nisha Manager Marketing $9,500

500 Randy DBA Technology $6,000

Print the lines which matches with the pattern.

$ awk '/Thomas/

> /Nisha/' employee.txt

100 Thomas Manager Sales $5,000

400 Nisha Manager Marketing $9,500

In the above example it prints all the line which matches with the ‘Thomas’ or ‘Nisha’. It has two patterns. Awk accepts any number of patterns, but each set (patterns and its corresponding actions) has to be separated by newline.

Print only specific field Awk has number of built in variables. For each record i.e line, it splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 4 words, it will be stored in $1, $2, $3 and $4. $0 represents whole line. NF is a built in variable which represents total number of fields in a record.

$ awk '{print $2,$5;}' employee.txt

Thomas $5,000

Jason $5,500

Sanjay $7,000

Nisha $9,500

Randy $6,000

$ awk '{print $2,$NF;}' employee.txt

Thomas $5,000

Jason $5,500

Sanjay $7,000

Nisha $9,500

Randy $6,000

Find the employees who has employee id greater than 200

$ awk '$1 >200' employee.txt

300 Sanjay Sysadmin Technology $7,000

400 Nisha Manager Marketing $9,500

500 Randy DBA Technology $6,000

In the above example, first field ($1) is employee id. So if $1 is greater than 200, then just do the default print action to print the whole line.

Print the list of employees in Technology department Now department name is available as a fourth field, so need to check if $4 matches with the string “Technology”, if yes print the line.

$ awk '$4 ~/Technology/' employee.txt

200 Jason Developer Technology $5,500

300 Sanjay Sysadmin Technology $7,000

500 Randy DBA Technology $6,000

Operator ~ is for comparing with the regular expressions. If it matches the default action i.e print whole line will be performed.

Print number of employees in Technology department

The below example, checks if the department is Technology, if it is yes, in the Action, just increment the count variable, which was initialized with zero in the BEGIN section.

$ awk 'BEGIN { count=0;}

$4 ~ /Technology/ { count++; }

END { print "Number of employees in Technology Dept =",count;}' employee.txt

Number of employees in Tehcnology Dept = 3

That's a wrap...