You are on page 1of 54

9346109393

04066367700
rajadhani motors
awk is one of the most powerful utilities used in the unix world. Whenever it co
mes to text parsing, sed and awk do some unbelievable things. In this first arti
cle on awk, we will see the basic usage of awk.
The syntax of awk is:
awk 'pattern{action}' file
where the pattern indicates the pattern or the condition on which the action
is to be executed for every line matching the pattern. In case of a pattern not
being present, the action will be executed for every line of the file. In case o
f the action part not being present, the default action of printing the line wil
l be done. Let us see some examples:
Assume a file, say file1, with the following content:
$ cat file1
Name Domain
Deepak Banking
Neha Telecom
Vijay Finance
Guru Migration
This file has 2 fields in it. The first field indicates the name of a person,
and the second field denoting their expertise, the first line being the header r
ecord.
1. To print only the names present in the file:
$ awk '{print $1}' file1
Name
Deepak
Neha
Vijay
Guru
The above awk command does not have any pattern or condition. Hence, the acti
on will be executed on every line of the file. The action statement reads "print
$1". awk, while reading a file, splits the different columns into $1, $2, $3 an
d so on. And hence the first column is accessible using $1, second using $2, etc
. And hence the above command prints all the names which happens to be first col
umn in the file.
2. Similarly, to print the second column of the file:
$ awk '{print $2}' file1
Domain
Banking
Telecom
Finance
Migration
3. In the first example, the list of names got printed along with the header rec
ord. How to omit the header record and get only the names printed?

$ awk 'NR!=1{print $1}' file1


Deepak
Neha
Vijay
Guru
The above awk command uses a special variable NR. NR denotes line number ran
ging from 1 to the actual line count. The conditon 'NR!=1' indicates not to exec
ute the action part for the first line of the file, and hence the header record
gets skipped.
4. How do we print the entire file contents?
$ awk '{print $0}' file1
Name Domain
Deepak Banking
Neha Telecom
Vijay Finance
Guru Migration
$0 stands for the entire line. And hence when we do "print $0", the whole line
gets printed.
5. How do we get the entire file content printed in other way?
$ awk '1' file1
Name Domain
Deepak Banking
Neha Telecom
Vijay Finance
Guru Migration
The above awk command has only the pattern or condition part, no action part.
The '1' in the pattern indicates "true" which means true for every line. As sai
d above, no action part denotes just to print which is the default when no actio
n statement is given, and hence the entire file contents get printed.
Let us now consider a file with a delimiter. The delimiter used here is a comma.
The comma separated file is called csv file. Assuming the file contents to be:
$ cat file1
Name,Domain,Expertise
Deepak,Banking,MQ Series
Neha,Telecom,Power Builder
Vijay,Finance,CRM Expert
Guru,Migration,Unix
This file contains 3 fields. The new field being the expertise of the respecti
ve person.
6. Let us try to print the first column of this csv file using the same method a
s mentioned in Point 1.
$ awk '{print $1}' file1
Name,Domain,Expertise
Deepak,Banking,MQ
Neha,Telecom,Power
Vijay,Finance,CRM
Guru,Migration,Unix

The output looks weird. Isnt it? We expected only the first column to get pri
nted, but it printed little more and that too not a definitive one. If you notic
e carefully, it printed every line till the first space is encountered. awk, by
default, uses the white space as the delimiter which could be a single space, ta
b space or a series of spaces. And hence our original file was split into fields
depending on space.
Since our requirement now involves dealing with a file which is comma separate
d, we need to specify the delimiter.
$ awk -F"," '{print $1}' file1
Name
Deepak
Neha
Vijay
Guru
awk has a command line option "-F' with which we can specify the delimiter. On
ce the delimiter is specified, awk splits the file on the basis of the delimiter
specified, and hence we got the names by printing the first column $1.
7. awk has a special variable called "FS" which stands for field separator. In p
lace of the command line option "-F', we can also use the "FS".
$ awk '{print $1,$3}' FS="," file1
Name Expertise
Deepak MQ Series
Neha Power Builder
Vijay CRM Expert
Guru Unix
8. Similarly, to print the second column:
$ awk -F, '{print $2}' file1
Domain
Banking
Telecom
Finance
Migration
9. To print the first and third columns, ie., the name and the expertise:
$ awk -F"," '{print $1, $3}' file1
Name Expertise
Deepak MQ Series
Neha Power Builder
Vijay CRM Expert
Guru Unix
10. The output shown above is not easily readable since the third column has mor
e than one word. It would have been better had the fields being displayed are pr
esent with a delimiter. Say, lets use comma to separate the output. Also, lets d
iscard the header record.
$ awk -F"," 'NR!=1{print $1,$3}' OFS="," file1
Deepak,MQ Series
Neha,Power Builder
Vijay,CRM Expert

Guru,Unix
OFS is another awk special variable. Just like how FS is used to separate the
input fields, OFS (Output field separator) is used to separate the output fields
. - See more at: http://www.theunixschool.com/2011/05/awk-read-file-and-split-co
ntents.html#sthash.O1N6TlIn.dpuf

At times, we might have some requirements wherein we need to pass some arguments
to the awk program or to access a shell variable or an environment variable ins
ide awk. Let us see in this article how to pass and access arguments in awk:
Let us take a sample file with contents, and a variable "x":
$ cat file1
24
12
34
45
$ echo $x
3
Now, say we want to add every value with the shell variable x.
1.awk provides a "-v" option to pass arguments. Using this, we can pass the shel
l variable to it.
$ awk -v val=$x '{print $0+val}' file1
27
15
37
48
As seen above, the shell variable $x is assigned to the awk variable "val". Th
is variable "val" can directly be accessed in awk.
2. awk provides another way of passing argument to awk without using -v. Just be
fore specifying the file name to awk, provide the shell variable assignments to
awk variables as shown below:
$ awk '{print $0,val}' OFS=, val=$x file1
24,3
12,3
34,3
45,3
3. How to access environment variables in awk? Unlike shell variables, awk provi
des a way to access the environment variables without passing it as above. awk h
as a special variable ENVIRON which does the needful.
$ echo $x
3
$ export x

$ awk '{print $0,ENVIRON["x"]}' OFS=, file1


24,3
12,3
34,3
45,3
Quoting file content:
Some times we might have a requirement wherein we have to quote the file conte
nts. Assume, you have a file which contains the list of database tables. And for
your requirement, you need to quote the file contents:
$ cat file
CUSTOMER
BILL
ACCOUNT
4. Pass a variable to awk which contains the double quote. Print the quote, line
, quote.
$ awk -v q="'" '{print q $0 q}' file
'CUSTOMER'
'BILL'
'ACCOUNT'
5. Similarly, to double quote the contents, pass the variable within single quot
es:
$ awk '{print q $0 q}' q='"' file
"CUSTOMER"
"BILL"
"ACCOUNT"

we will see mainly how to search for a pattern in a file in awk. Searching patte
rn in the entire line or in a specific column.
Let us consider a csv file with the following contents. The data in the csv fi
le contains kind of expense report. Let us see how to use awk to filter data fro
m the file.
$ cat file
Medicine,200
Grocery,500
Rent,900
Grocery,800
Medicine,600
1. To print only the records containing Rent:
$ awk '$0 ~ /Rent/{print}' file
Rent,900
~ is the symbol used for pattern matching. The / / symbols are used to spe
cify the pattern. The above line indicates: If the line($0) contains(~) the patt
ern Rent, print the line. 'print' statement by default prints the entire line. T

his is actually the simulation of grep command using awk.


2. awk, while doing pattern matching, by default does on the entire line, and he
nce $0 can be left off as shown below:
$ awk '/Rent/{print}' file
Rent,900
3. Since awk prints the line by default on a true condition, print statement can
also be left off.
$ awk '/Rent/' file
Rent,900
In this example, whenever the line contains Rent, the condition becomes true an
d the line gets printed.
4. In the above examples, the pattern matching is done on the entire line, howev
er, the pattern we are looking for is only on the first column. This might lead
to incorrect results if the file contains the word Rent in other places. To mat
ch a pattern only in the first column($1),
$ awk -F, '$1 ~ /Rent/' file
Rent,900
The -F option in awk is used to specify the delimiter. It is needed here s
ince we are going to work on the specific columns which can be retrieved only wh
en the delimiter is known.
5. The above pattern match will also match if the first column contains "Rents".
To match exactly for the word "Rent" in the first column:
$ awk -F, '$1=="Rent"' file
Rent,900
6. To print only the 2nd column for all "Medicine" records:
$ awk -F, '$1 == "Medicine"{print $2}' file
200
600
7. To match for patterns "Rent" or "Medicine" in the file:
$ awk '/Rent|Medicine/' file
Medicine,200
Rent,900
Medicine,600
8. Similarly, to match for this above pattern only in the first column:
$ awk -F, '$1 ~ /Rent|Medicine/' file
Medicine,200
Rent,900
Medicine,600
9. What if the the first column contains the word "Medicines". The above example
will match it as well. In order to exactly match only for Rent or Medicine,
$ awk -F, '$1 ~ /^Rent$|^Medicine$/' file
Medicine,200

Rent,900
Medicine,600
The ^ symbol indicates beginning of the line, $ indicates the end of the lin
e. ^Rent$ matches exactly for the word Rent in the first column, and the same is
for the word Medicine as well.
10. To print the lines which does not contain the pattern Medicine:
$ awk '!/Medicine/' file
Grocery,500
Rent,900
Grocery,800
The ! is used to negate the pattern search.
11. To negate the pattern only on the first column alone:
$ awk -F, '$1 !~ /Medicine/' file
Grocery,500
Rent,900
Grocery,800
12. To print all records whose amount is greater than 500:
$ awk -F, '$2>500' file
Rent,900
Grocery,800
Medicine,600
13. To print the Medicine record only if it is the 1st record:
$ awk 'NR==1 && /Medicine/' file
Medicine,200
This is how the logical AND(&&) condition is used in awk. The records neede
d to be retrieved is only if it is the first record(NR==1) and the record is a m
edicine record.
14. To print all those Medicine records whose amount is greater than 500:
$ awk -F, '/Medicine/ && $2>500' file
Medicine,600
15. To print all the Medicine records and also those records whose amount is gre
ater than 600:
$ awk -F, '/Medicine/ || $2>600' file
Medicine,200
Rent,900
Grocery,800
Medicine,600

we will see the how we can join lines based on a pattern or joining lines on enc

ountering a pattern using awk or gawk.


Let us assume a file with the following contents. There is a line with START inbetween. We have to join all the lines following the pattern START.
$ cat file
START
Unix
Linux
START
Solaris
Aix
SCO
1. Join the lines following the pattern START without any delimiter.
$ awk '/START/{if (NR!=1)print "";next}{printf $0}END{print "";}' file
UnixLinux
SolarisAixSCO
Basically, what we are trying to do is: Accumulate the lines following the
START and print them on encountering the next START statement. /START/ searches
for lines containing the pattern START. The command within the {} will work onl
y on lines containing the START pattern. Prints a blank line if the line is not
the first line(NR!=1). Without this condition, a blank line will come in the ver
y beginning of the output since it encounters a START in the beginning.
The next command prevents the remaining part of the command from getting exec
uted for the START lines. The second part of braces {} works only for the lines
not containing the START. This part simply prints the line without a terminating
new line character(printf). And hence as a result, we get all the lines after t
he pattern START in the same line. The END label is put to print a newline at th
e end without which the prompt will appear at the end of the last line of output
itself.
2. Join the lines following the pattern START with space as delimiter.
$ awk '/START/{if (NR!=1)print "";next}{printf "%s ",$0}END{print "";}' file
Unix Linux
Solaris Aix SCO
This is same as the earlier one except it uses the format specifier %s in or
der to accommodate an additional space which is the delimiter in this case.
3. Join the lines following the pattern START with comma as delimiter.
$ awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}' file
Unix,Linux
Solaris,Aix,SCO
Here, we form a complete line and store it in a variable x and print the var
iable x whenever a new pattern starts. The command: x=(!x)?$0:x","$0 is like the
ternary operator in C or Perl. It means if x is empty, assign the current line(
$0) to x, else append a comma and the current line to x. As a result, x will con
tain the lines joined with a comma following the START pattern. And in the END l
abel, x is printed since for the last group there will not be a START pattern to
print the earlier group.
4. Join the lines following the pattern START with comma as delimiter with also
the pattern matching line.

$ awk '/START/{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' file


START,Unix,Linux
START,Solaris,Aix,SCO
The difference here is the missing next statement. Because next is not ther
e, the commands present in the second set of curly braces are applicable for the
START line as well, and hence it also gets concatenated.
5. Join the lines following the pattern START with comma as delimiter with also
the pattern matching line. However, the pattern line should not be joined.
$ awk '/START/{if (x)print x;print;x="";next}{x=(!x)?$0:x","$0;}END{print x;}' f
ile
START
Unix,Linux
START
Solaris,Aix,SCO

awk is very powerful when it comes for file formatting. In this article, we wil
l discuss some wonderful grouping features of awk. awk can group a data based on
a column or field , or on a set of columns. It uses the powerful associative ar
ray for grouping. If you are new to awk, this article will be easier to understa
nd if you can go over the article how to parse a simple CSV file using awk.
Let us take a sample CSV file with the below contents. The file is kind of an ex
pense report containing items and their prices. As seen, some expense items hav
e multiple entries.
$ cat file
Item1,200
Item2,500
Item3,900
Item2,800
Item1,600
1. To find the total of all numbers in second column. i.e, to find the sum of al
l the prices.
$ awk -F"," '{x+=$2}END{print x}' file
3000
The delimiter(-F) used is comma since its a comma separated file. x+=$2 stan
ds for x=x+$2. When a line is parsed, the second column($2) which is the price,
is added to the variable x. At the end, the variable x contains the sum. This ex
ample is same as discussed in the awk example of finding the sum of all numbers
in a file.
If your input file is a text file with the only difference being the comma no
t present in the above file, all you need to make is one change. Remove this par
t from the above command: -F"," . This is because the default delimiter in awk

is whitespace.
2. To find the total sum of particular group entry alone. i.e, in this case, of
"Item1":
$ awk -F, '$1=="Item1"{x+=$2;}END{print x}' file
800
This gives us the total sum of all the items pertaining to "Item1". In the ear
lier example, no condition was specified since we wanted awk to work on every li
ne or record. In this case, we want awk to work on only the records whose first
column($1) is equal to Item1.
3. If the data to be worked upon is present in a shell variable:
$ VAR="Item1"
$ awk -F, -v inp=$VAR '$1==inp{x+=$2;}END{print x}' file
800
-v is used to pass the shell variable to awk, and the rest is same as the las
t one.
4. To find unique values of first column
$ awk -F, '{a[$1];}END{for (i in a)print i;}' file
Item1
Item2
Item3
Arrays in awk are associative and is a very powerful feature. Associate arra
ys have an index and a corresponding value. Example: a["Jan"]=30 meaning in the
array a, "Jan" is an index with value 30. In our case here, we use only the inde
x without values. So, the command a[$1] works like this: When the first record i
s processed, in the array named a, an index value "Item1" is stored. During the
second record, a new index "Item2", during third "Item3" and so on. During the 4
th record, since the "Item1" index is already there, no new index is added and t
he same continues.
Now, once the file is processed completely, the control goes to the END label
where we print all the index items. for loop in awk comes in 2 variants: 1. The
C language kind of for loop, Second being the one used for associate arrays.
for i in a : This means for every index in the array a . The variable "i" hold
s the index value. In place of "i", it can be any variable name. Since there are
3 elements in the array, the loop will run for 3 times, each time holding the v
alue of an index in the "i". And by printing "i", we get the index values printe
d.
To understand the for loop better, look at this:
for (i in a)
{
print i;
}
Note: The order of the output in the above command may vary from system to syste
m. Associative arrays do not store the indexes in sequence and hence the order o
f the output need not be the same in which it is entered.
5. To find the sum of individual group records. i.e, to sum all records pertaini

ng to Item1 alone, Item2 alone, and so on.


$ awk -F, '{a[$1]+=$2;}END{for(i in a)print i", "a[i];}' file
Item1, 800
Item2, 1300
Item3, 900
a[$1]+=$2 . This can be written as a[$1]=a[$1]+$2. This works like this: When
the first record is processed, a["Item1"] is assigned 200(a["Item1"]=200). Duri
ng second "Item1" record, a["Item1"]=800 (200+600) and so on. In this way, every
index item in the array is stored with the appropriate value associated to it w
hich is the sum of the group.
And in the END label, we print both the index(i) and the value(a[i]) which is
nothing but the sum.
6. To find the sum of all entries in second column and add it as the last record
.
$ awk -F"," '{x+=$2;print}END{print "Total,"x}' file
Item1,200
Item2,500
Item3,900
Item2,800
Item1,600
Total,3000
This is same as the first example except that along with adding the value eve
ry time, every record is also printed, and at the end, the "Total" record is als
o printed.
7. To print the maximum or the biggest record of every group:
$ awk -F, '{if (a[$1] < $2)a[$1]=$2;}END{for(i in a){print i,a[i];}}' OFS=, file
Item1,600
Item2,800
Item3,900
Before storing the value($2) in the array, the current second column value
is compared with the existing value and stored only if the value in the current
record is bigger. And finally, the array will contain only the maximum values a
gainst every group. In the same way, just by changing the "lesser than(<)" symbo
l to greater than(>), we can find the smallest element in the group.
The syntax for if in awk is, similar to the C language syntax:
if (condition)
{
<code for true condition >
}else{
<code for false condition>
}
8. To find the count of entries against every group:
$ awk
Item1
Item2
Item3

-F, '{a[$1]++;}END{for (i in a)print i, a[i];}' file


2
2
1

a[$1]++ : This can be put as a[$1]=a[$1]+1. When the first "Item1" record is

parsed, a["Item1"]=1 and every item on encountering "Item1" record, this count
is incremented, and the same follows for other entries as well. This code simply
increments the count by 1 for the respective index on encountering a record. An
d finally on printing the array, we get the item entries and their respective co
unts.
9. To print only the first record of every group:
$ awk -F, '!a[$1]++' file
Item1,200
Item2,500
Item3,900
A little tricky this one. In this awk command, there is only condition, no a
ction statement. As a result, if the condition is true, the current record gets
printed by default.
!a[$1]++ : When the first record of a group is encountered, a[$1] remains 0 sin
ce ++ is post-fix, and not(!) of 0 is 1 which is true, and hence the first recor
d gets printed. Now, when the second records of "Item1" is parsed, a[$1] is 1 (
will become 2 after the command since its a post-fix). Not(!) of 1 is 0 which is
false, and the record does not get printed. In this way, the first record of ev
ery group gets printed.
Simply by removing '!' operator, the above command will print all records oth
er than the first record of the group.
10. To join or concatenate the values of all group items. Join the values of the
second column with a colon separator:
$ awk -F, '{if(a[$1])a[$1]=a[$1]":"$2; else a[$1]=$2;}END{for (i in a)print i, a
[i];}' OFS=, file
Item1,200:600
Item2,500:800
Item3,900
This if condition is pretty simple: If there is some value in a[$1], then a
ppend or concatenate the current value using a colon delimiter, else just assign
it to a[$1] since this is the first value.
To make the above if block clear, let me put it this way: "if (a[$1])" means "
if a[$1] has some value".
if(a[$1])
a[$1]=a[$1]":"$2;
else
a[$1]=$2
The same can be achieved using the awk ternary operator as well which is same
as in the C language.
$ awk -F, '{a[$1]=a[$1]?a[$1]":"$2:$2;}END{for (i in a)print i, a[i];}' OFS=, fi
le
Item1,200:600
Item2,500:800
Item3,900
Ternary operator is a short form of if-else condition. An example of ternary ope
rator is: x=x>10?"Yes":"No" means if x is greater than 10, assign "Yes" to x, e
lse assign "No".
In the same way: a[$1]=a[$1]?a[$1]":"$2:$2 means if a[$1] has some value assign
a[$1]":"$2 to a[$1] , else simply assign $2 to a[$1].

Concatenate variables in awk:


One more thing to notice is the way string concatenation is done in awk. To conc
atenate 2 variables in awk, use a space in-between.
Examples:
z=x y
#to concatenate x and y
z=x":"y #to concatenate x and y with a colon separator.

we will see the different scenarios in which we need to split a file into multip
le files using awk. The files can be split into multiple files either based on a
condition, or based on a pattern or because the file is big and hence needs to
split into smaller files.
Sample File1:
Let us consider a sample file with the following contents:
$ cat file1
Item1,200
Item2,500
Item3,900
Item2,800
Item1,600
1. Split the file into 3 different files, one for each item. i.e, All records pe
rtaining to Item1 into a file, records of Item2 into another, etc.
$ awk -F, '{print > $1}' file1
The files generated by the above command are as below:
$ cat Item1
Item1,200
Item1,600
$ cat Item3
Item3,900
$ cat Item2
Item2,500
Item2,800
This looks so simple, right? print prints the entire line, and the line is pr
inted to a file whose name is $1, which is the first field. This means, the firs
t record will get written to a file named 'Item1', and the second record to 'Ite
m2', third to 'Item3', 4th goes to 'Item2', and so on.

2. Split the files by having an extension of .txt to the new file names.
$ awk -F, '{print > $1".txt"}' file1
The only change here from the above is concatenating the string ".txt" to th
e $1 which is the first field. As a result, we get the extension to the file nam
es. The files created are below:
$ ls *.txt
Item2.txt Item1.txt Item3.txt
3. Split the files by having only the value(the second field) in the individual
files, i.e, only 2nd field in the new files without the 1st field:
$ awk -F, '{print $2 > $1".txt"}' file1
The print command prints the entire record. Since we want only the second f
ield to go to the output files, we do: print $2.
$ cat Item1.txt
200
600
4. Split the files so that all the items whose value is greater than 500 are in
the file "500G.txt", and the rest in the file "500L.txt".
$ awk -F, '{if($2<=500)print > "500L.txt";else print > "500G.txt"}' file1
The output files created will be as below:
$ cat 500L.txt
Item1,200
Item2,500
$ cat 500G.txt
Item3,900
Item2,800
Item1,600
Check the second field($2). If it is lesser or equal to 500, the record goe
s to "500L.txt", else to "500G.txt".
Other way to achieve the same thing is using the ternary operator in awk:
$ awk -F, '{x=($2<=500)?"500L.txt":"500G.txt"; print > x}' file1
The condition for greater or lesser than 500 is checked and the appropriat
e file name is assigned to variable x. The record is then written to the file pr
esent in the variable x.
Sample File2:
Let us consider another file with a different set of contents. This file has a p
attern 'START' at frequent intervals.
$ cat file2
START
Unix
Linux
START
Solaris

Aix
SCO
5. Split the file into multiple files at every occurrence of the pattern START .
$ awk '/START/{x="F"++i;}{print > x;}' file2
This command contains 2 sets of curly braces: The control goes to the first
set of braces only on encountering a line containing the pattern START. The sec
ond set will be encountered by every line since there is no condition and hence
always true.
On encountering the pattern START, a new file name is created and stored. Whe
n the first START comes, x will contain "F1" and the control goes to the next se
t of braces and the record is written to F1, and the subsequent records go the f
ile "F1" till the next START comes. On encountering next START, x will contain "
F2" and the subsequent lines goes to "F2" till the next START, and it continues.
$ cat F1
START
Unix
Linux
Solaris
$ cat F2
START
Aix
SCO
6. Split the file into multiple files at every occurrence of the pattern START.
But the line containing the pattern should not be in the new files.
$ awk '/START/{x="F"++i;next}{print > x;}' file2
The only difference in this from the above is the inclusion of the next c
ommand. Due to the next command, the lines containing the START enters the first
curly braces and then starts reading the next line immediately due to the next
command. As a result, the START lines does not get to the second curly braces an
d hence the START does not appear in the split files.
$ cat F1
Unix
Linux
$ cat F2
Solaris
Aix
SCO
7. Split the file by inserting a header record in every new file.
$ awk '/START/{x="F"++i;print "ANY HEADER" > x;next}{print > x;}' file2
The change here from the earlier one is this: Before the next command, we
write the header record into the file. This is the right place to write the hea
der record since this is where the file is created first.
$ cat F1
ANY HEADER
Unix
Linux

$ cat F2
ANY HEADER
Solaris
Aix
SCO
Sample File3:
Let us consider a file with the sample contents:
$ cat file3
Unix
Linux
Solaris
AIX
SCO
8. Split the file into multiple files at every 3rd line . i.e, First 3 lines int
o F1, next 3 lines into F2 and so on.
$ awk 'NR%3==1{x="F"++i;}{print > x}' file3
In other words, this is nothing but splitting the file into equal parts. T
he condition does the trick here: NR%3==1 : NR is the line number of the current
record. NR%3 will be equal to 1 for every 3rd line such as 1st, 4th, 7th and so
on. And at every 3rd line, the file name is changed in the variable x, and henc
e the records are written to the appropriate files.
$ cat F1
Unix
Linux
Solaris
$ cat F2
Aix
SCO
Sample File4:
Let us update the above file with a header and trailer:
$ cat file4
HEADER
Unix
Linux
Solaris
AIX
SCO
TRAILER
9. Split the file at every 3rd line without the header and trailer in the new fi
les.
sed '1d;$d;' file4 | awk 'NR%3==1{x="F"++i;}{print > x}'
The earlier command does the work for us, only thing is to pass to the
above command without the header and trailer. sed does it for us. '1d' is to de
lete the 1st line, '$d' to delete the last line.
$ cat F1
Unix

Linux
Solaris
$ cat F2
AIX
SCO
10. Split the file at every 3rd line, retaining the header and trailer in every
file.
$ awk 'BEGIN{getline f;}NR%3==2{x="F"++i;a[i]=x;print f>x;}{print > x}END{for(j=
1;j<i;j++)print> a[j];}' file4
This one is little tricky. Before the file is processed, the first line is
read using getline into the variable f. NR%3 is checked with 2 instead of 1 as i
n the earlier case because since the first line is a header, we need to split th
e files at 2nd, 5th, 8th lines, and so on. All the file names are stored in the
array "a" for later processing.
Without the END label, all the files will have the header record, but only t
he last file will have the trailer record. So, the END label is to precisely wri
te the trailer record to all the files other than the last file.
$ cat F1
HEADER
Unix
Linux
Solaris
TRAILER
$ cat F2
HEADER
Aix
SCO
TRAILER

we will see how to use awk to read or parse text or CSV files containing multipl
e delimiters or repeating delimiters. Also, we will discuss about some peculiar
delimiters and how to handle them using awk.
Let us consider a sample file. This colon separated file contains item, purchase
year and a set of prices separated by a semicolon.
$ cat file
Item1:2010:10;20;30

Item2:2012:12;29;19
Item3:2014:15;50;61
1. To print the 3rd column which contains the prices:
$ awk -F: '{print $3}' file
10;20;30
12;29;19
15;50;61
This is straight forward. By specifying colon(:) in the option with -F, the 3r
d column can be retrieved using the $3 variable.
2. To print the 1st component of $3 alone:
$ awk -F '[:;]' '{print $4}' file
20
29
50
What did we do here? Specified multiple delimiters, one is : and other is ;
. How awk parses the file? Its simple. First, it looks at the delimiters which
is colon(:) and semi-colon(;). This means, while reading the line, as and when t
he delimiter : or ; is encountered, store the part read in $1. Continue further
. Again on encountering one of the delimiters, store the read part in $2. And th
is continues till the end of the line is reached. In this way, $4 contained the
first part of the price component above.
Note:
Always keep in mind. While specifying multiple delimiters, it has to be
specified inside square brackets( [;:] ).
3. To sum the individual components of the 3rd column and print it:
$ awk -F '[;:]' '{$3=$3+$4+$5;print $1,$2,$3}' OFS=: file
Item1:2010:60
Item2:2012:60
Item3:2014:126
The individual components of the price($3) column are available in $3, $4
and $5. Simply, sum them up and store in $3, and print all the variables. OFS (o
utput field separator) is used to specify the delimiter while printing the outpu
t.
Note: If we do not use the OFS, awk will print the fields using the default outp
ut delimiter which is space.
4. Un-group or re-group every record depending on the price column:
$ awk -F '[;:]' '{for(i=3;i<=5;i++){print $1,$2,$i;}}' OFS=":" file
Item1:2010:10
Item1:2010:20
Item1:2010:30
Item2:2012:12
Item2:2012:29
Item2:2012:19
Item3:2014:15
Item3:2014:50
Item3:2014:61
The requirement here is: New records have to be created for every componen
t of the price column. Simply, a loop is run on from columns 3 to 5, and every t

ime a record is framed using the price component.


5-6. Read file in which the delimiter is square brackets:
$ cat file
123;abc[202];124
125;abc[203];124
127;abc[204];124
5. To print the value present within the brackets:
$ awk -F '[][]' '{print $2}' file
202
203
204
At the first sight, the delimiter used in the above command might be confus
ing. Its simple. 2 delimiters are to be used in this case: One is [ and the othe
r is ]. Since the delimiters itself is square brackets which is to be placed wi
thin the square brackets, it looks tricky at the first instance.
Note: If square brackets are delimiters, it should be put in this way only, mean
ing first ] followed by [. Using the delimiter like -F '[[]]' will give a differ
ent interpretation altogether.
6. To print the first value, the value within brackets, and the last value:
$ awk -F '[][;]' '{print $1,$3,$5}' OFS=";" file
123;202;124
125;203;124
127;204;124
3 delimiters are used in this case with semi-colon also included.
7-8. Read or parse a file containing a series of delimiters:
$ cat file
123;;;202;;;203
124;;;213;;;203
125;;;222;;;203
The above file contains a series of 3 semi-colons between every 2 values.
7. Using the multiple delimiter method:
$ awk -F'[;;;]' '{print $2}' file

Blank output !!! The above delimiter, though specified as 3 colons is as goo
d as one delimiter which is a semi-colon(;) since they are all the same. Due to
this, $2 will be the value between the first and the second semi-colon which in
our case is blank and hence no output.
8. Using the delimiter without square brackets:
$ awk -F';;;' '{print $2}' file
202
213
222

The expected output !!! No square brackets is used and we got the output w
hich we wanted.
Difference between using square brackets and not using it : When a set of delimi
ters are specified using square brackets, it means an OR condition of the delimi
ters. For example, -F '[;:]' means to separate the contents either on encounteri
ng ':' or ';'. However, when a set of delimiters are specified without using squ
are brackets, awk looks at them literally to separate the contents. For example,
-F ':;' means to separate the contents only on encountering a colon followed by
a semi-colon. Hence, in the last example, the file contents are separated only
when a set of 3 continuous semi-colons are encountered.
9. Read or parse a file containing a series of delimiters of varying lengths:
In the below file, the 1st and 2nd column are separated using 3 semi-colo
ns, however the 2nd and 3rd are separated by 4 semi-colons
$ cat file
123;;;202;;;;203
124;;;213;;;;203
125;;;222;;;;203
$ awk -F';'+ '{print $2,$3}' file
202 203
213 203
222 203
The '+' is a regular expression. It indicates one or more of previous char
acters. ';'+ indicates one or more semi-colons, and hence both the 3 semi-colons
and 4 semi-colons get matched.
10. Using a word as a delimiter:
$ cat file
123Unix203
124Unix203
125Unix203
Retrieve the numbers before and after the word "Unix" :
$ awk -F'Unix' '{print $1, $2}' file
123 203
124 203
125 203
In this case, we use the word "Unix" as the delimiter. And hence $1 and $2
contained the appropriate values . Keep in mind, it is not just the special char
acters which can be used as delimiters. Even alphabets, words can also be used a
s delimiters.

we will see how to access the awk variables in shell? Or How to access awk varia

bles as shell variables ? Let us see the different ways in which we can achieve
this.
Let us consider a file with the sample contents as below:
$ cat file
Linux 20
Solaris 30
HPUX 40
1. Access the value of the entry "Solaris" in a shell variable, say x:
$ x=`awk '/Solaris/{a=$2;print a}' file`
$ echo $x
30
This approach is fine as long as we want to access only one value. What if we ha
ve to access multiple values in shell?
2. Access the value of "Solaris" in x, and "Linux" in y:
$ z=`awk '{if($1=="Solaris")print "x="$2;if($1=="Linux")print "y="$2}' file`
$ echo "$z"
y=20
x=30
$ eval $z
$ echo $x
30
$ echo $y
20
awk sets the value of "x" and "y" awk variables and prints which is collected
in the shell variable "z". The eval command evaluates the variable meaning it e
xecutes the commands present in the variable. As a result, "x=30" and "y=20" get
s executed, and they become shell variables x and y with appropriate values.
3. Same using the sourcing method:
$ awk '{if($1=="Solaris")print "x="$2;if($1=="Linux")print "y="$2}' file > f1
$ source f1
$ echo $x
30
$ echo $y
20
Here, instead of collecting the output of awk command in a variable, it is re-di
rected to a temporary file. The file is then sourced or in other words executed
in the same shell. As a result, "x" and "y" become shell variables.
Note: Depending on the shell being used, the appropriate way of sourcing has to
be done. The "source" command is used here since the default shell is bash.

How to insert/add a column between columns, remove columns, or to update a parti


cular column? Let us discuss in this article.
Consider a CSV file with the following contents:
$ cat file
Unix,10,A
Linux,30,B
Solaris,40,C
Fedora,20,D
Ubuntu,50,E
1. To insert a new column (say serial number) before the 1st column
$ awk -F, '{$1=++i FS $1;}1' OFS=, file
1,Unix,10,A
2,Linux,30,B
3,Solaris,40,C
4,Fedora,20,D
5,Ubuntu,50,E
$1=++i FS $1 => Space is used to concatenate columns in awk. This expression con
catenates a new field(++i) with the 1st field along with the delimiter(FS), and
assigns it back to the 1st field($1). FS contains the file delimiter.
2. To insert a new column after the last column
$ awk -F, '{$(NF+1)=++i;}1' OFS=, file
Unix,10,A,1
Linux,30,B,2
Solaris,40,C,3
Fedora,20,D,4
Ubuntu,50,E,5
$NF indicates the value of last column. Hence,by assigning something to $(NF+1),
a new field is inserted at the end automatically.
3. Add 2 columns after the last column:
$ awk -F, '{$(NF+1)=++i FS "X";}1' OFS=, file
Unix,10,A,1,X
Linux,30,B,2,X
Solaris,40,C,3,X
Fedora,20,D,4,X
Ubuntu,50,E,5,X

The explanation gives for the above 2 examples holds good here.
4. To insert a column before the 2nd last column
$ awk -F, '{$(NF-1)=++i FS $(NF-1);}1' OFS=, file
Unix,1,10,A
Linux,2,30,B
Solaris,3,40,C
Fedora,4,20,D
Ubuntu,5,50,E
NF-1 points to the 2nd last column. Hence, by concatenating the serial number in
the beginning of NF-1 ends up in inserting a column before the 2nd last.
5. Update 2nd column by adding 10 to the variable:
$ awk -F, '{$2+=10;}1' OFS=, file
Unix,20,A
Linux,40,B
Solaris,50,C
Fedora,30,D
Ubuntu,60,E
$2 is incremented by 10.
6.Convert a specific column(1st column) to uppercase in the CSV file:
$ awk -F, '{$1=toupper($1)}1' OFS=, file
UNIX,10,A
LINUX,30,B
SOLARIS,40,C
FEDORA,20,D
UBUNTU,50,E
Using the toupper function of the awk, the 1st column is converted from lowercas
e to uppercase.
7. Extract only first 3 characters of a specific column(1st column):
$ awk -F, '{$1=substr($1,0,3)}1' OFS=, file
Uni,10,A
Lin,30,B
Sol,40,C
Fed,20,D
Ubu,50,E
Using the substr function of awk, a substring of only the first few characters c
an be retrieved.
8.Empty the value in the 2nd column:
$ awk -F, '{$2="";}1' OFS=, file
Unix,,A
Linux,,B
Solaris,,C
Fedora,,D
Ubuntu,,E
Set the variable of 2nd column($2) to blank(""). Now, when the line is printed,

$2 will be blank.
9. Remove/Delete the 2nd column from the CSV file:
$ awk -F, '{for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x=2 file
Unix,A
Linux,B
Solaris,C
Fedora,D
Ubuntu,E
By just emptying a particular column, the column stays as is with empty value. T
o remove a column, all the subsequent columns from that position, needs to be ad
vanced one position ahead. The for loop loops on all the fields. Using the terna
ry operator, every column is concatenated to the variable "f" provided it is no
t 2nd column using the FS as delimiter. At the end, the variable "f" is printed
which contains the updated record. The column to be removed is passed through th
e awk variable "x" and hence just be setting the appropriate number in x, any sp
ecific column can be removed.
10. Join 3rd column with 2nd colmn using ':' and remove the 3rd column:
$ awk -F, '{$2=$2":"$x;for(i=1;i<=NF;i++)if(i!=x)f=f?f FS $i:$i;print f;f=""}' x
=3 file
Unix,10:A
Linux,30:B
Solaris,40:C
Fedora,20:D
Ubuntu,50:E
Almost same as last example expcept that first the 3rd column($3) is concatenate
d with 2nd column($2) and then removed.

gawk has 3 functions to calculate date and time:


systime
strftime
mktime
Let us see in this article how to use these functions:

systime:
This function is equivalent to the Unix date (date +%s) command. It gives the
Unix time, total number of seconds elapsed since the epoch(01-01-1970 00:00:00)
.
$ echo | awk '{print systime();}'
1358146640
Note: systime function does not take any arguments.
strftime:
A very common function used in gawk to format the systime into a calendar for
mat. Using this function, from the systime, the year, month, date, hours, mins a
nd seconds can be separated.
Syntax:
strftime (<format specifiers>,unix time);
1. Printing current date time using strftime:
$ echo | awk '{print strftime("%d-%m-%y %H-%M-%S",systime());}'
14-01-13 12-37-45
strftime takes format specifiers which are same as the format specifiers avai
lable with the date command. %d for date, %m for month number (1 to 12), %y for
the 2 digit year number, %H for the hour in 24 hour format, %M for minutes and %
S for seconds. In this way, strftime converts Unix time into a date string.
2. Display current date time using strftime without systime:
$ echo | awk '{print strftime("%d-%m-%y %H-%M-%S");}'
14-01-13 12-38-08
Both the arguments of strftime are optional. When the timestamp is not provid
ed, it takes the systime by default.
3. strftime with no arguments:
$ echo | awk '{print strftime();}'
Mon Jan 14 12:30:05 IST 2013
strftime without the format specifiers provides the output in the default outp
ut format as the Unix date command.
mktime:
mktime function converts any given date time string into a Unix time, which
is of the systime format.
Syntax:
mktime(date time string) # where date time string is a string which contains a
tleast 6 components in the following order: YYYY MM DD HH MM SS
1. Printing timestamp for a specific date time :
$ echo | awk '{print mktime("2012 12 21 0 0 0");}'
1356028200
This gives the Unix time for the date 21-Dec-12.
2. Using strftime with mktime:
$ echo | awk '{print strftime("%d-%m-%Y",mktime("2012 12 21 0 0 0"));}'

21-12-2012
The output of mktime can be validated by formatting the mktime output using t
he strftime function as above.
3. Negative date in mktime:
$ echo | awk '{print strftime("%d-%m-%Y",mktime("2012 12 -1 0 0 0"));}'
29-11-2012
mktime can take negative values as well. -1 in the date position indicates one
day before the date specified which in this case leads to 29th Nov 2012.
4. Negative hour value in mktime:
$ echo | awk '{print strftime("%d-%m-%Y %H-%M-%S",mktime("2012 12 3 -2 0 0"));}'
02-12-2012 22-00-00
-2 in the hours position indicates 2 hours before the specified date time wh
ich in this case leads to "2-Dec-2012 22" hours.

How to find the time difference between timestamps using gawk?


Let us consider a file where the
1st column is the Process name,
2nd is the start time of the process, and
3rd column is the end time of the process.
The requirement is to find the time consumed by the process which is the differe
nce between the start and the end times.
1. File in which the date and time component are separated by a space:
$ cat file
P1,2012 12 4 21 36 48,2012 12 4 22 26 53
P2,2012 12 4 20 36 48,2012 12 4 21 21 23
P3,2012 12 4 18 36 48,2012 12 4 20 12 35
Time difference in seconds:
$ awk -F, '{d2=mktime($3);d1=mktime($2);print $1","d2-d1,"secs";}' file
P1,3005 secs
P2,2675 secs
P3,5747 secs
Using mktime function, the Unix time is calculated for the date time strings,
and their difference gives us the time elapsed in seconds.
2. File with the different date format :
$ cat file
P1,2012-12-4 21:36:48,2012-12-4 22:26:53
P2,2012-12-4 20:36:48,2012-12-4 21:21:23
P3,2012-12-4 18:36:48,2012-12-4 20:12:35
Note: This file has the start time and end time in different formats

Difference in seconds:
$ awk -F, '{gsub(/[-:]/," ",$2);gsub(/[-:]/," ",$3);d2=mktime($3);d1=mktime($2);
print $1","d2-d1,"secs";}' file
P1,3005 secs
P2,2675 secs
P3,5747 secs
Using gsub function, the '-' and ':' are replaced with a space. This is done be
cause the mktime function arguments should be space separated.
Difference in minutes:
$ awk -F, '{gsub(/[-:]/," ",$2);gsub(/[-:]/," ",$3);d2=mktime($3);d1=mktime($2);
print $1","(d2-d1)/60,"mins";}' file
P1,50.0833 mins
P2,44.5833 mins
P3,95.7833 mins
Just by dividing the seconds difference by 60 gives us the difference in minutes
.
3. File with only date, without time part:
$ cat file
P1,2012-12-4,2012-12-6
P2,2012-12-4,2012-12-8
P3,2012-12-4,2012-12-5
Note: The start and end time has only the date components, no time components
Difference in seconds:
$ awk -F, '{gsub(/-/," ",$2);gsub(/-/," ",$3);$2=$2" 0 0 0";$3=$3" 0 0 0";d2=mkt
ime($3);d1=mktime($2);print $1","d2-d1,"secs";}' file
P1,172800 secs
P2,345600 secs
P3,86400 secs
In addition to replacing the '-' and ':' with spaces, 0's are appended to the da
te field since the mktime requires the date in 6 column format.
Difference in days:
$ awk -F, '{gsub(/-/," ",$2);gsub(/-/," ",$3);$2=$2" 0 0 0";$3=$3" 0 0 0";d2=mkt
ime($3);d1=mktime($2);print $1","(d2-d1)/86400,"days";}' file
P1,2 days
P2,4 days
P3,1 days
A day has 86400(24*60*60) seconds, and hence by dividing the duration in seco
nds by 86400, the duration in days can be obtained.

================================================================================
=====================================================
================================================================================
=====================================================
================================================================================
=====================================================
================================================================================
=====================================================
================================================================================
=====================================================
================================================================================
=====================================================
================================================================================
=====================================================
================================================================================
=====================================================

we are going to see how to delete or remove a particular line or a particular pa


ttern from a file using the sed command.
Let us consider a file with the sample contents as below:
$ cat file
Cygwin
Unix
Linux
Solaris
AIX
1. Delete the 1st line or the header line:
$ sed '1d' file
Unix
Linux
Solaris
AIX
d command is to delete a line. 1d means to delete the first line.
The above command will show the file content by deleting the first line. Ho
wever, the source file remains unchanged. To update the original file itself wit
h this deletion or to make the changes permanently in the source file, use the i option. The same is applicable for all the other examples.
sed -i '1d' file
Note: -i option in sed is available only if it is GNU sed. If not GNU, re-di
rect the sed output to a file, and rename the output file to the original file.
2. Delete a particular line, 3rd line in this case:
$ sed '3d' file
Cygwin
Unix
Solaris
AIX

3. Delete the last line or the trailer line of the file:


$ sed '$d' file
Cygwin
Unix
Linux
Solaris
$ indicates the last line.
4. Delete a range of lines, from 2nd line till 4th line:
$ sed '2,4d' file
Cygwin
AIX
The range is specified using the comma operator.
5. Delete lines other than the specified range, line other than 2nd till 4th her
e:
$ sed '2,4!d' file
Unix
Linux
Solaris
The ! operator indicates negative condition.
6. Delete the first line AND the last line of a file, i.e, the header and traile
r line of a file.
$ sed '1d;$d' file
Unix
Linux
Solaris
Multiple conditions are separated using the ';' operator. Similarly, say to
delete 2nd and 4th line, you can use: '2d;3d'.
7. Delete all lines beginning with a particular character, 'L' in this case:
$ sed '/^L/d' file
Cygwin
Unix
Solaris
AIX
'^L' indicates lines beginning with L.
8. Delete all lines ending with a particular character, 'x' in this case:
$ sed '/x$/d' file
Cygwin
Solaris
AIX
'x$' indicates lines ending with 'x'. AIX did not get deleted because the
X is capital.

9. Delete all lines ending with either x or X, i.e case-insensitive delete:


$ sed '/[xX]$/d' file
Cygwin
Solaris
[xX] indicates either 'x' or 'X'. So, this will delete all lines ending with
either small 'x' or capital 'X'.
10. Delete all blank lines in the file
$ sed '/^$/d' file
Cygwin
Unix
Linux
Solaris
AIX
'^$' indicates lines containing nothing and hence the empty lines get dele
ted. However, this wont delete lines containing only some blank spaces.
11. Delete all lines which are empty or which contains just some blank spaces:
$ sed '/^ *$/d' file
Cygwin
Unix
Linux
Solaris
AIX
'*' indicates 0 or more occurrences of the previous character. '^ *$' indi
cates a line containing zero or more spaces. Hence, this will delete all lines w
hich are either empty or lines with only some blank spaces.
12. Delete all lines which are entirely in capital letters:
$ sed '/^[A-Z]*$/d' file
Cygwin
Unix
Linux
Solaris
[A-Z] indicates any character matching the alphabets in capital.
13. Delete the lines containing the pattern 'Unix'.
$ sed '/Unix/d' file
Cygwin
Linux
Solaris
AIX
The pattern is specified within a pair of slashes.
14. Delete the lines NOT containing the pattern 'Unix':
$ sed '/Unix/!d' file
Unix
15. Delete the lines containing the pattern 'Unix' OR 'Linux':

$ sed '/Unix\|Linux/d' file


Cygwin
Solaris
AIX
The OR condition is specified using the | operator. In order not to get the
pipe(|) interpreted as a literal, it is escaped using a backslash.
16. Delete the lines starting from the 1st line till encountering the pattern 'L
inux':
$ sed '1,/Linux/d' file
Solaris
AIX
Earlier, we saw how to delete a range of lines. Range can be in many combina
tions: Line ranges, pattern ranges, line and pattern, pattern and line.
17. Delete the lines starting from the pattern 'Linux' till the last line:
$ sed '/Linux/,$d' file
Cygwin
Unix
18. Delete the last line ONLY if it contains the pattern 'AIX':
$ sed '${/AIX/d;}' file
Cygwin
Unix
Linux
Solaris
$ is for the last line. To delete a particular line only if it contains th
e pattern AIX, put the line number in place of the $. This is how we can impleme
nt the 'if' condition in sed.
19. Delete the last line ONLY if it contains either the pattern 'AIX' or 'HPUX':
$ sed '${/AIX\|HPUX/d;}' file
Cygwin
Unix
Linux
Solaris
20. Delete the lines containing the pattern 'Solaris' only if it is present in t
he lines from 1 to 4.
$ sed '1,4{/Solaris/d;}' file
Cygwin
Unix
Linux
AIX
This will only delete the lines containing the pattern Solaris only if it
is in the 1st four lines, nowhere else.
21. Delete the line containing the pattern 'Unix' and also the next line:
$ sed '/Unix/{N;d;}' file

Cygwin
Solaris
AIX
N command reads the next line in the pattern space. d deletes the entire pa
ttern space which contains the current and the next line.
22. Delete only the next line containing the pattern 'Unix', not the very line:
$ sed '/Unix/{N;s/\n.*//;}' file
Cygwin
Unix
Solaris
AIX
Using the substitution command s, we delete from the newline character till
the end, which effective deletes the next line after the line containing the pa
ttern Unix.
23. Delete the line containing the pattern 'Linux', also the line before the pat
tern:
$ sed -n '/Linux/{s/.*//;x;d;};x;p;${x;p;}' file | sed '/^$/d'
Cygwin
Solaris
AIX
A little tricky ones. In order to delete the line prior to the pattern,we st
ore every line in a buffer called as hold space. Whenever the pattern matches, w
e delete the content present in both, the pattern space which contains the curre
nt line, the hold space which contains the previous line.
Let me explain this command: 'x;p;' ; This gets executed for every line. x ex
changes the content of pattern space with hold space. p prints the pattern space
. As a result, every time, the current line goes to hold space, and the previous
line comes to pattern space and gets printed. When the pattern /Linux/ matches,
we empty(s/.*//) the pattern space, and exchange(x) with the hold space(as a re
sult of which the hold space becomes empty) and delete(d) the pattern space whic
h contains the previous line. And hence, the current and the previous line gets
deleted on encountering the pattern Linux. The ${x;p;} is to print the last line
which will remain in the hold space if left.
The second part of sed is to remove the empty lines created by the first sed c
ommand.
24. Delete only the line prior to the line containing the pattern 'Linux', not
the very line:
$ sed -n '/Linux/{x;d;};1h;1!{x;p;};${x;p;}' file
Cygwin
Linux
Solaris
AIX
This is almost same as the last one with few changes. On encountering the p
attern /Linux/, we exchange(x) and delete(d). As a result of exchange, the curre
nt line remains in hold space, and the previous line which came into pattern spa
ce got deleted.
1h;1!{x;p;} - 1h is to move the current line to hold space only if it first
line. Exchange and print for all the other lines. This could easily have been s
imply: x;p . The drawback is it gives an empty line at the beginning because dur
ing the first exchange between the pattern space and hold space, a new line come

s to pattern space since hold space is empty.


25. Delete the line containing the pattern 'Linux', the line before, the line af
ter:
$ sed -n '/Linux/{N;s/.*//;x;d;};x;p;${x;p;}' file | sed '/^$/d'
Cygwin
AIX
With the explanations of the last 2 commands, this should be fairly simple
to understand.

we will see a few more frequent search and replace operations done on files usin
g sed.
Let us consider a file with the following contents:
$ cat file
RE01:EMP1:25:2500
RE02:EMP2:26:2650
RE03:EMP3:24:3500
RE04:EMP4:27:2900
1. To replace the first two(2) characters of a string or a line with say "XX":
$ sed 's/^../XX/' file
XX01:EMP1:25:2500
XX02:EMP2:26:2650
XX03:EMP3:24:3500
XX04:EMP4:27:2900
The "^" symbol indicates from the beginning. The two dots indicate 2 charac
ters.
The same thing can also be achieved without using the carrot(^) symbol as s
hown below. This also works because by default sed starts any operation from the
beginning.
sed 's/../XX/' file
2. In the same lines, to remove or delete the first two characters of a string
or a line.
$ sed 's/^..//' file
01:EMP1:25:2500
02:EMP2:26:2650
03:EMP3:24:3500
04:EMP4:27:2900
Here the string to be substituted is empty, and hence gets deleted.
3. Similarly, to remove/delete the last two characters in the string:
$ sed 's/..$//' file
RE01:EMP1:25:25
RE02:EMP2:26:26

RE03:EMP3:24:35
RE04:EMP4:27:29
4. To add a string to the end of a line:
$ sed 's/$/.Rs/' file
RE01:EMP1:25:2500.Rs
RE02:EMP2:26:2650.Rs
RE03:EMP3:24:3500.Rs
RE04:EMP4:27:2900.Rs
Here the string ".Rs" is being added to the end of the line.
5. To add empty spaces to the beginning of every line in a file:
$ sed 's/^/ /' file
RE01:EMP1:25:Rs.2500
RE02:EMP2:26:Rs.2650
RE03:EMP3:24:Rs.3500
RE04:EMP4:27:Rs.2900
To make any of the sed command change permanent to the file OR in other word
s, to save or update the changes in the same file, use the option "-i"
$ sed -i 's/^/ /' file
$ cat file
RE01:EMP1:25:Rs.2500
RE02:EMP2:26:Rs.2650
RE03:EMP3:24:Rs.3500
RE04:EMP4:27:Rs.2900
6. To remove empty spaces from the beginning of a line:
$ sed 's/^ *//' file
RE01:EMP1:25:2500
RE02:EMP2:26:2650
RE03:EMP3:24:3500
RE04:EMP4:27:2900
"^ *"(space followed by a *) indicates a sequence of spaces in the beginni
ng.
7. To remove empty spaces from beginning and end of string.
$ sed 's/^ *//; s/ *$//' file
RE01:EMP1:25:2500
RE02:EMP2:26:2650
RE03:EMP3:24:3500
RE04:EMP4:27:2900
This example also shows to use multiple sed command substitutions as part of
the same command.
The same command can also be written as :
sed -e 's/^ *//' -e 's/ *$//' file
8. To add a character before and after a string. Or in other words, to encapsula
te the string with something:

$ sed 's/.*/"&"/' file


"RE01:EMP1:25:Rs.2500"
"RE02:EMP2:26:Rs.2650"
"RE03:EMP3:24:Rs.3500"
"RE04:EMP4:27:Rs.2900"
".*" matches the entire line. '&' denotes the pattern matched. The substitu
tion pattern "&" indicates to put a double-quote at the beginning and end of the
string.
9. To remove the first and last character of a string:
$ sed 's/^.//;s/.$//' file
RE01:EMP1:25:2500
RE02:EMP2:26:2650
RE03:EMP3:24:3500
RE04:EMP4:27:2900
10. To remove everything till the first digit comes :
$ sed 's/^[^0-9]*//' file
01:EMP1:25:2500
02:EMP2:26:2650
03:EMP3:24:3500
04:EMP4:27:2900
Similarly, to remove everything till the first alphabet comes:
sed 's/^[^a-zA-Z]*//' file
11. To remove a numerical word from the end of the string:
$ sed 's/[0-9]*$//' file
RE01:EMP1:25:
RE02:EMP2:26:
RE03:EMP3:24:
RE04:EMP4:27:
12. To get the last column of a file with a delimiter. The delimiter in this cas
e is ":".
$ sed 's/.*://' file
2500
2650
3500
2900
For a moment, one can think the output of the above command to be the same c
ontents without the first column and the delim. sed is greedy. When we tell, '.*
:' it goes to the last column and consumes everything. And hence, we only the ge
t the content after the last colon.
13. To convert the entire line into lower case:
$ sed 's/.*/\L&/' file
re01:emp1:25:rs.2500
re02:emp2:26:rs.2650
re03:emp3:24:rs.3500
re04:emp4:27:rs.2900

\L is the sed switch to convert to lower case. The operand following the \
L gets converted. Since &(the pattern matched, which is the entire line in this
case) is following \L, the entire line gets converted to lower case.
14. To convert the entire line or a string to uppercase :
$ sed 's/.*/\U&/' file
RE01:EMP1:25:RS.2500
RE02:EMP2:26:RS.2650
RE03:EMP3:24:RS.3500
RE04:EMP4:27:RS.2900
Same as above, \U instead of \L.

we are going to see the different options sed provides to selectively print cont
ents in a file. Let us take a sample file with the following contents:
$ cat
Gmail
Yahoo
Redif

file
10
20
18

1. To print the entire file contents:


$ sed
Gmail
Yahoo
Redif

'' file
10
20
18

2. To print only the line containing 'Gmail'. In other words, to simulate the gr
ep command:
$ sed
Gmail
Gmail
Yahoo
Redif

'/Gmail/p' file
10
10
20
18

Within the slashes, we specify the pattern which we try to match. The 'p' comma
nd tells to print the line. Look at the above result properly, the line Gmail go
t printed twice. Why? This is because the default behavior of sed is to print e
very line after parsing it. On top of it, since we asked sed to print the line c
ontaining the pattern 'Gmail' explicitly by specifying 'p", the line 'Gmail' got
printed twice. How to get the desired result now?
$ sed -n '/Gmail/p' file
Gmail 10
The desired result can be obtained by suppressing the default printing which
can be done by using the option "-n". And hence the above result.

3. To delete the line containing the pattern 'Gmail'. In other words, to simulat
e the "grep -v" command option in sed:
$ sed '/Gmail/d' file
Yahoo 20
Redif 18
The "d" command denotes the delete the pattern. As said earlier, the default
action of sed is to print. Hence, all the other lines got printed, and the line
containing the pattern 'Gmail' got deleted since we have specified explicit "d"
option.
In the same lines, say to delete the first line of the file:
$ sed '1d' file
Yahoo 20
Redif 18
4. Print lines till you encounter a specific pattern, say till 'Yahoo' is encoun
tered.
$ sed '/Yahoo/q' file
Gmail 10
Yahoo 20
The "q" command tells to quit from that point onwards. This sed command tell
s to keep printing(which is default) and stop processing once the pattern "Yahoo
" is encountered.
Printing Range of Lines:
Till now, what we saw is to retrieve a line or a set of lines based on a condi
tion. Now, we will see how to get the same for a given range:
Consider the below sample file:
$ cat
Gmail
Yahoo
Redif
Inbox
Live
Hotml

file
10
20
18
15
23
09

5. To print the first 3 lines, or from lines 1 through 3:


$ sed
Gmail
Yahoo
Redif

-n '1,3p' file
10
20
18

The option "-n" suppresses the default printing. "1,3p" indicates to print fr
om lines 1 to 3.
The same can also be achieved through:
$ sed
Gmail
Yahoo
Redif

'3q' file
10
20
18

3q denotes to quit after reading the 3rd line. Since the "-n" option is not u
sed, the first 3 lines get printed.
6. Similar to give line number ranges, sed can also work on pattern ranges. Say,
to print from lines between patterns "Yahoo" and "Live":
$ sed
Yahoo
Redif
Inbox
Live

-n '/Yahoo/,/Live/p' file
20
18
15
23

The pattern is always specified between the slashes. The comma operator is use
d to specify the range. This command tells to print all those lines between the
patterns "Yahoo" and 'Live".
7. To print the lines from pattern "Redif" till the end of the file.
$ sed
Redif
Inbox
Live
Hotml

-n '/Redif/,$p' file
18
15
23
09

The earlier examples were line number ranges and pattern ranges. sed allows
us to use both (line number and pattern) in the same command itself. This comma
nd indicates to print the lines from pattern "Redif" till the end of the file($)
.
8. Similarly, to print contents from the beginning of the file till the pattern
"Inbox":
$ sed
Gmail
Yahoo
Redif
Inbox

-n '1,/Inbox/p' file
10
20
18
15

we will see how to read a file into a sed output, and also how to write a sectio
n of a file content to a different file.
Let us assume we have 2 files, file1 and file2 with the following content:
$ cat file1
1apple
1banana
1mango
$ cat file2
2orange
2strawberry
sed has 2 options for reading and writing:

r filename : To read a file name content specified in the filename


w filename : To write to a file specified in the filename
Let us see some examples now:
1. Read the file2 after every line of file1.
$ sed 'r file2' file1
1apple
2orange
2strawberry
1banana
2orange
2strawberry
1mango
2orange
2strawberry
r file2 reads the file contents of file2. Since there is no specific number b
efore 'r', it means to read the file contents of file2 for every line of file1.
And hence the above output.
2. The above output is not very useful. Say, we want to read the file2 contents
after the 1st line of file1:
$ sed '1r file2' file1
1apple
2orange
2strawberry
1banana
1mango
'1r' indicates to read the contents of file2 only after reading the line1 of f
ile1.
3. Similarly, we can also try to read a file contents on finding a pattern:
$ sed '/banana/r file2' file1
1apple
1banana
2orange
2strawberry
1mango
The file2 contents are read on finding the pattern banana and hence the above
output.
4. To read a file content on encountering the last line:
$ sed '$r file2' file1
1apple
1banana
1mango
2orange
2strawberry
The '$' indicates the last line, and hence the file2 contents are read after
the last line. Hey, hold on. The above example is put to show the usage of $ in
this scenario. If your requirement is really something like above, you need not

use sed. cat file1 file2 will do :) .


Let us now move onto the writing part of sed. Consider a file, file1, with the b
elow contents:
$ cat file1
apple
banana
mango
orange
strawberry
1. Write the lines from 2nd to 4th to a file, say file2.
$ sed -n '2,4w file2' file1
The option '2,4w' indicates to write the lines from 2 to
n "-n" for? By default, sed prints every line it reads, and
and without "-n" will still print the file1 contents on the
order to suppress this default output, "-n' is used. Let us
ents to check the above output.

4. What is the optio


hence the above comm
standard output. In
print the file2 cont

$ cat file2
banana
mango
orange
Note: Even after running the above command, the file1 contents still remain inta
ct.
2. Write the contents from the 3rd line onwards to a different file:
$ sed -n '3,$w file2' file1
$ cat file2
mango
orange
strawberry
As explained earlier, the '3,$' indicates from 3 line to end of the file.
3. To write a range of lines, say to write from lines apple through mango :
$ sed -n '/apple/,/mango/w file2' file1
$ cat file2
apple
banana
mango

we saw how to insert a line or append a line to an existing file using sed. In t
his article, we will see how we can do data manipulation or substitution in file
s using sed.
Let us consider a sample file, sample1.txt, as shown below:

apple
orange
banana
pappaya

1. To add something to the beginning of a every line in a file, say to add a wor
d Fruit:
$ sed 's/^/Fruit: /' sample1.txt
Fruit: apple
Fruit: orange
Fruit: banana
Fruit: pappaya
The character 's' stands for substitution. What follows 's' is the character,
word or regular expression to replace followed by character, word or regular exp
ression to replace with. '/' is used to separate the substitution character 's',
the content to replace and the content to replace with. The '^' character tells
replace in the beginning and hence everyline gets added the phrase 'Fruit: ' in
the beginning of the line.
2. Similarly, to add something to the end of the file:
$ sed 's/$/ Fruit/' sample1.txt
apple Fruit
orange Fruit
banana Fruit
pappaya Fruit
The character '$' is used to denote the end of the line. And hence this means,
replace the end of the line with 'Fruit' which effectively means to add the wor
d 'Fruit' to the end of the line.
3. To replace or substitute a particular character, say to replace 'a' with 'A'.
$ sed 's/a/A/' sample1.txt
Apple
orAnge
bAnana
pAppaya
Please note in every line only the first occurrence of 'a' is being replaed,
not all. The example shown here is just for a single character replacement, whic
h can be easily be done for a word as well.
4. To replace or substitute all occurrences of 'a' with 'A'
$ sed 's/a/A/g' sample1.txt
Apple
orAnge
bAnAnA
pAppAyA
5. Replace the first occurrence or all occurrences is fine. What if we want to r
eplace the second occurrence or third occurrence or in other words nth occurrenc
e.
To replace only the 2nd occurrence of a character :

$ sed 's/a/A/2' sample1.txt


apple
orange
banAna
pappAya
Please note above. The 'a' in apple has not changed, and so is in orange since
there is no 2nd occurrence of 'a' in this. However, the changes have happened a
ppropriately in banana and pappaya
6. Now, say to replace all occurrences from 2nd occurrence onwards:
$ sed 's/a/A/2g' sample1.txt
apple
orange
banAnA
pappAyA
7. Say, you want to replace 'a' only in a specific line say 3rd line, not in the
entire file:
$ sed '3s/a/A/g' sample1.txt
apple
orange
bAnAnA
pappaya
'3s' denotes the substitution to be done is only for the 3rd line.
8. To replace or substitute 'a' on a range of lines, say from 1st to 3rd line:
$ sed '1,3s/a/A/g' sample1.txt
Apple
orAnge
bAnAnA
pappaya
9. To replace the entire line with something. For example, to replace 'apple' wi
th 'apple is a Fruit'.
$ sed 's/.*/& is a Fruit/' sample1.txt
apple is a Fruit
orange is a Fruit
banana is a Fruit
pappaya is a Fruit
The '&' symbol denotes the entire
using '.*' which means matching the
This type of matching will be really
ile names and you want to say rename
articles: Rename group of files

pattern matched. In this case, since we are


entire line, '&' contains the entire line.
useful when you a file containing list of f
them as we have shown in one of our earlier

10. Using sed, we can also do multiple substitution. For example, say to replace
all 'a' to 'A', and 'p' to 'P':
$ sed 's/a/A/g; s/p/P/g' sample1.txt
APPle
orAnge
bAnAnA

PAPPAyA
OR This can also be done as:
$ sed -e 's/a/A/g' -e 's/p/P/g' sample1.txt
APPle
orAnge
bAnAnA
PAPPAyA
The option '-e' is used when you have more than one set of substitutions to be
done.
OR The multiple substitution can also be done as shown below spanning multiple l
ines:
$ sed -e 's/a/A/g' \
> -e 's/p/P/g' sample1.txt
APPle
orAnge
bAnAnA
PAPPAyA
sed is one of the most important editors we use in UNIX. It supports lot of file
editing tasks. In this article, we will see a specific set of sed options.
Assume I have a flat file, empFile, containing employee name and employee id as
shown below:
Hilesh, 1001
Bharti, 1002
Aparna, 1003
Harshal, 1004
Keyur, 1005
1. How to add a header line say "Employee, EmpId" to this file using sed?
$ sed '1i Employee, EmpId' empFile
Employee, EmpId
Hilesh, 1001
Bharti, 1002
Aparna, 1003
Harshal, 1004
Keyur, 1005
This command does the following: The number '1' tells the operation is to be
done only for the first line. 'i' stands for including the following content bef
ore reading the line. So, '1i' means to include the following before reading the
first line and hence we got the header in the file.
However, the file with the header is displayed only in the output, the file c
ontents still remain the old file. So, if the user's requirement is to update th
e original file with this output, the user has to re-direct the output of the se
d command to a temporary file and then move it to the original file.
The UNIX system which has the GNU version contains sed with the '-i' option. T
his option of the sed command is used to edit the file in-place. Let us see the
same above example using '-i' option:

$ cat empFile
Hilesh, 1001
Bharti, 1002
Aparna, 1003
Harshal, 1004
Keyur, 1005
$ sed -i '1i Employee, EmpId' empFile
$ cat empFile
Employee, EmpId
Hilesh, 1001
Bharti, 1002
Aparna, 1003
Harshal, 1004
Keyur, 1005
As shown above, the '-i' option edits the file in-place without the need of a
temporary file.
2. How to add a line '-------' after the header line or the 1st line?
$ sed -i '1a ---------------' empFile
$ cat empFile
Employee, EmpId
--------------Hilesh, 1001
Bharti, 1002
Aparna, 1003
Harshal, 1004
Keyur, 1005
'1i' is similar to '1a' except that 'i' tells to include the content before r
eading the line, 'a' tells to include the content after reading the line. And he
nce in this case, the '----' line gets included after the 1st line. As you thoug
ht correctly, even if you had used '2i', it will work well and fine.
3. How to add a trailer line to this file?
$ sed -i '$a ---------------' empFile
$ cat empFile
Employee, EmpId
--------------Hilesh, 1001
Bharti, 1002
Aparna, 1003
Harshal, 1004
Keyur, 1005
--------------To add to the last line of the file, we need to know the total line count of
the file to use in the above mentioned methods. However, sed has the '$' symbol
which denotes the last line. '$a' tells to include the following content after r
eading the last line of the file.
4. How to add a record after a particular record?
Let us assume the sample file contains only 3 records as shown below:
Employee, EmpId
---------------

Hilesh, 1001
Harshal, 1004
Keyur, 1005
--------------Now, if I want to insert the record for the employee 'Bharti' after the employe
e 'Hilesh':
$ sed -i '/Hilesh/a Bharti, 1002' empFile
$ cat empFile
Employee, EmpId
--------------Hilesh, 1001
Bharti, 1002
Harshal, 1004
Keyur, 1005
--------------If you note the above sed command carefully, all we have done is in place o
f a number, we have used a pattern. /Hilesh/a tells to include the following con
tents after finding the pattern 'Hilesh', and hence the result.
5. How to add a record before a particular record? Say, add the record for the e
mployee 'Aparna' before the employee record of 'Harshal'
$ sed -i '/Harshal/i Aparna, 1003' empFile
$ cat empFile
Employee, EmpId
--------------Hilesh, 1001
Bharti, 1002
Aparna, 1003
Harshal, 1004
Keyur, 1005
--------------Similarly, /Harshal/i tells to include the following contents before reading t
he line containing the pattern 'Harshal'.
Note: As said above, the '-i' option will only work if the sed is GNU sed. Else
the user has to re-direct the output to a temporary file and move it to the orig
inal file.

we will see the examples of how to remove or delete characters from a file. The
syntax of sed command replacement is:
$ sed 's/find/replace/' file
This sed command finds the pattern and replaces with another pattern. When the
replace is left empty, the pattern/element found gets deleted.
Let us consider a sample file as below:
$ cat file
Linux
Solaris

Ubuntu
Fedora
RedHat
1. To remove a specific character, say 'a'
$ sed 's/a//' file
Linux
Solris
Ubuntu
Fedor
RedHt
This will remove the first occurence of 'a' in every line of the file. To remo
ve all occurences of 'a' in every line,
$ sed 's/a//g' file
2. To remove 1st character in every line:
$ sed 's/^.//' file
inux
olaris
buntu
edora
edHat
.(dot) tries to match a single character. The ^ tries to match a pattern(any
character) in the beginning of the line. Another way to write the same:
$ sed 's/.//' file
This tells to replace a character with nothing. Since by default, sed starts f
rom beginning, it replaces only the 1st character since 'g' is not passed.
3. To remove last character of every line :
$ sed 's/.$//' file
Linu
Solari
Ubunt
Fedor
RedHa
The $ tries to match a pattern in the end of the line.
4. To remove the 1st and last character of every line in the same command:
$ sed 's/.//;s/.$//' file
inu
olari
bunt
edor
edHa
Two commands can be given together with a semi-colon separated in between.
5. To remove first character only if it is a specific character:
$ sed 's/^F//' file

Linux
Solaris
Ubuntu
edora
RedHat
This removes the 1st character only if it is 'F'.
6. To remove last character only if it is a specific character:
$ sed 's/x$//' file
Linu
Solaris
Ubuntu
Fedora
RedHat
This removed the last character only if it s 'x'.
7. To remove 1st 3 characters of every line:
$ sed 's/...//' file
ux
aris
ntu
ora
Hat
A single dot(.) removes 1st character, 3 dots remove 1st three characters.
8. To remove 1st n characters of every line:
$ sed -r 's/.{4}//' file
x
ris
tu
ra
at
.{n} -> matches any character n times, and hence the above expression matches
4 characters and deletes it.
9. To remove last n characters of every line:
$ sed -r 's/.{3}$//' file
Li
Sola
Ubu
Fed
Red
10. To remove everything except the 1st n characters in every line:
$ sed -r 's/(.{3}).*/\1/' file
Lin
Sol
Ubu
Fed
Red

.* -> matches any number of characters, and the first 3 characters matched are
grouped using parantheses. In the replacement, by having \1 only the group is r
etained, leaving out the remaining part.
11. To remove everything except the last n characters in a file:
$ sed -r 's/.*(.{3})/\1/' file
nux
ris
ntu
ora
Hat
Same as last example, except that from the end.
12. To remove multiple characters present in a file:
$ sed 's/[aoe]//g' file
Linux
Slris
Ubuntu
Fdr
RdHt
To delete multiple characters, [] is used by specifying the characters to be
removed. This will remove all occurences of the characters a, o and e.
13. To remove a pattern :
$ sed 's/lari//g' file
Linux
Sos
Ubuntu
Fedora
RedHat
Not just a character, even a pattern can be removed. Here, 'lari' got removed f
rom 'Solaris'.
14. To delete only nth occurrence of a character in every line:
$ sed 's/u//2' file
Linux
Solaris
Ubunt
Fedora
RedHat
By default, sed performs an activity only on the 1st occurence. If n is specif
ed, sed performs only on the nth occurence of the pattern. The 2nd 'u' of 'Ubunt
u' got deleted.
15. To delete everything in a line followed by a character:
$ sed 's/a.*//' file
Linux
Sol
Ubuntu
Fedor

RedH
16. To remove all digits present in every line of a file:
$ sed 's/[0-9]//g' file
[0-9] stands for all characters between 0 to 9 meaning all digits, and hence a
ll digits get removed.
17. To remove all lower case alphabets present in every line:
$ sed 's/[a-z]//g' file
L
S
U
F
RH
[a-z] represents lower case alphabets range and hence all lower-case character
s get removed.
18. To remove everything other than the lower case alphabets:
$ sed 's/[^a-z]//g' file
inux
olaris
buntu
edora
edat
^ inside square brackets negates the condition. Here, all characters except l
ower case alphabets get removed.
19. To remove all alpha-numeric characters present in every line:
$ sed 's/[a-zA-Z0-9]//g' file
All alpha-numeric characters get removed.
20. To remove a character irrespective of the case:
$ sed 's/[uU]//g' file
Linx
Solaris
bnt
Fedora
RedHat
By specifying both the lower and upper case character in brackets is equivalent
to removing a character irrespective of the case.

How to use sed to work with a CSV file? Or How to work with any file in which fi
elds are separated by a delimiter?
Let us consider a sample CSV file with the following content:
cat file

Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
1. To remove the 1st field or column :
$ sed 's/[^,]*,//' file
25,11
31,2
21,3
45,4
12,5
This regular expression searches for a sequence of non-comma([^,]*) character
s and deletes them which results in the 1st field getting removed.
2. To print only the last field, OR remove all fields except the last field:
$ sed 's/.*,//' file
11
2
3
4
5
This regex removes everything till the last comma(.*,) which results in deleti
ng all the fields except the last field.
3. To print only the 1st field:
$ sed 's/,.*//' file
Solaris
Ubuntu
Fedora
LinuxMint
RedHat
This regex(,.*) removes the characters starting from the 1st comma till the
end resulting in deleting all the fields except the last field.
4. To delete the 2nd field:
$ sed 's/,[^,]*,/,/' file
Solaris,11
Ubuntu,2
Fedora,3
LinuxMint,4
RedHat,5
The regex (,[^,]*,) searches for a comma and sequence of characters followe
d by a comma which results in matching the 2nd column, and replaces this pattern
matched with just a comma, ultimately ending in deleting the 2nd column.
Note: To delete the fields in the middle gets more tougher in sed since every fi
eld has to be matched literally.
5. To print only the 2nd field:

$ sed 's/[^,]*,\([^,]*\).*/\1/' file


25
31
21
45
12
The regex matches the first field, second field and the rest, however groups
the 2nd field alone. The whole line is now replaced with the 2nd field(\1), henc
e only the 2nd field gets displayed.
6. Print only lines in which the last column is a single digit number:
$ sed -n '/.*,[0-9]$/p' file
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
The regex (,[0-9]$) checks for a single digit in the last field and the p comm
and prints the line which matches this condition.
7. To number all lines in the file:
$
1
2
3
4
5

sed = file | sed 'N;s/\n/ /'


Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,12,5

This is simulation of cat -n command. awk does it easily using the special v
ariable NR. The '=' command of sed gives the line number of every line followed
by the line itself. The sed output is piped to another sed command to join every
2 lines.
8. Replace the last field by 99 if the 1st field is 'Ubuntu':
$ sed 's/\(Ubuntu\)\(,.*,\).*/\1\299/' file
Solaris,25,11
Ubuntu,31,99
Fedora,21,3
LinuxMint,45,4
RedHat,12,5
This regex matches 'Ubuntu' and till the end except the last column and groups
each of them as well. In the replacement part, the 1st and 2nd group along with
the new number 99 is substituted.
9. Delete the 2nd field if the 1st field is 'RedHat':
$ sed 's/\(RedHat,\)[^,]*\(.*\)/\1\2/' file
Solaris,25,11
Ubuntu,31,2
Fedora,21,3
LinuxMint,45,4
RedHat,,5
The 1st field 'RedHat', the 2nd field and the remaining fields are grouped
, and the replacement is done with only 1st and the last group , resuting in get

ting the 2nd field deleted.


10. To insert a new column at the end(last column) :
$ sed 's/.*/&,A/' file
Solaris,25,11,A
Ubuntu,31,2,A
Fedora,21,3,A
LinuxMint,45,4,A
RedHat,12,5,A
The regex (.*) matches the entire line and replacing it with the line itself
(&) and the new field.
11. To insert a new column in the beginning(1st column):
$ sed 's/.*/A,&/' file
A,Solaris,25,11
A,Ubuntu,31,2
A,Fedora,21,3
A,LinuxMint,45,4
A,RedHat,12,5
Same as last example, just the line matched is followed by the new column.
Note: sed is generally not preferred on files which has fields separated by a de
limiter because it is very difficult to access fields in sed unlike awk or Perl
where splitting fields is a breeze.

we will see how to print a particular line using the print(p) command of sed.
Let us consider a file with the following contents:
$ cat file
AIX
Solaris
Unix
Linux
HPUX
1. Print only the first line of the file:
$ sed -n '1p' file
AIX
Similarly, to print a particular line, put the line number before 'p'.
2. Print only the last line of the file
$ sed -n '$p' file
HPUX
$ indicates the last line.
3. Print lines which does not contain 'X':

$ sed -n '/X/!p' file


Solaris
Unix
Linux
!p indicates the negative condition to print.
4. Print lines which contain the character 'u' or 'x' :
$ sed -n '/[ux]/p' file
Unix
Linux
[ux] indicates line containing the pattern either 'u' or 'x'.
5. Print lines which end with 'x' or 'X' :
$ sed -n '/[xX]$/p' file
AIX
Unix
Linux
HPUX
6. Print lines beginning with either 'A' or 'L':
$ sed -n '/^A\|^L/p' file
AIX
Linux
The pipe is used to provide multiple pattern matching. Like this, multiple pat
terns can be provided for searching.
7. Print every alternate line:
$ sed 'n;d' file
AIX
Unix
HPUX
n command prints the current line, and immediately reads the next line into pa
ttern space. d command deletes the line present in pattern space. In this way, a
lternate lines get printed.
8. Print every 2 lines:
$ sed 'n;n;N;d' file
AIX
Solaris
HPUX
n;n; => This command prints 2 lines and the 3rd line is present in the patter
n space. N command reads the next line and joins with the current line, and d de
ltes the entire stuff present in the pattern space. With this, the 3rd and 4th l
ines present in the pattern space got deleted. Since this repeats till the end o
f the file, it ends up in printing every 2 lines.
9. Print lines ending with 'X' within a range of lines:
$ sed -n '/Unix/,${/X$/p;}' file

HPUX
The range of lines being chosen are starting from the line containing the patt
ern 'Unix' till the end of the file($). The commands present within the braces a
re applied only for this range of lines. Within this group, only the lines endin
g with 'x' are printed. Refer this to know how to print a range of lines using s
ed from example 5 onwards.
10. Print range of lines excluding the starting and ending line of the range:
$ sed -n '/Solaris/,/HPUX/{//!p;}' file
Unix
Linux
The range of lines chosen is from 'Solaris' to 'HPUX'. The action within the b
races is applied only for this range of lines. If no pattern is provided in patt
ern matching (//), the last matched pattern is considered. For eg, when the line
containing the pattern 'Solaris' matches the range of lines and gets inside the
curly braches, since no pattern is present, the last pattern (solaris) is match
ed. Since this matching is true, it is not printed(!p), and the same becomes tru
e for the last line in the group as well.

How to find the sum of all numbers or columns in a line of a text / CSV file?
$ sed 's/ /+/g' file | bc
$ awk '{print $1+$2+$3}' file

You might also like