match and sort using awk

March 24, 2010 at 12:47:32
Specs: Linux i686
Hello,

can I do match and sort data from two fields using awk?

Like for example:
input

apple cat
in rat
out sky
cat pen
rat ball
sky cpu
pen linux
ball unix
cpu paper
linux phone
unix
paper
phone
phone
paper
unix
cpu

desired out put
apple
ball ball
cat cat
cpu cpu
cpu
in
linux linux
out
paper paper
paper
pen pen
phone phone
phone
rat rat
sky sky
unix unix
unix


See More: match and sort using awk

Report •

#1
March 24, 2010 at 23:22:13
Interesting problem. Classic awk does not have a sort function, but the GNU version gawk does - asort. I parse each line and read each of the fields into an array. Once the file is read, I sort the first array and then perform a comparison with the second array. If a comparison exists, I delete the second array entry so it is not used again.

This program assumes that the second field, if defined, is unique within the file:


#!/bin/bash

gawk ' BEGIN { idx1=0 }
{
idx1++
# build the first array
myarr1[idx1]=$1

# build the second array only if second field exists
if(NF > 1)
   myarr2[$2]=$2
}

END {
asort(myarr1)

for(i=1; i<=idx1; i++)
   if(myarr1[i] in myarr2)
     {
     printf("%s %s\n", myarr1[i], myarr2[myarr1[i]])
     # remove the entry after printing once
     delete myarr2[myarr1[i]]
     }
   else
     printf("%s\n", myarr1[i])

} ' datafile.txt



Report •
Related Solutions


Ask Question