Hi, I have a input file like thisTH2TH2867Y NOW33332106Yo You Baby
TH2TH3867Y NOW33332106No Way Out
TH2TH9867Y NOW33332106Can't find it
TJ2TJ2872N WOW33332017sure thing alas
TJ2TJ3872N WOW33332017the sky rocks
TJ2TJ4872N WOW33332017nothing else matters
TJ2TJ5872N WOW33332017you know about it
TJ2TJ6872N WOW33331999nothing else matters
TJ2TJ7872N WOW33332017nothing else matters
TJ2TJ8872N WOW33332017No Way Out
TJ2TAW872N WOW33331999No Way Out
TJAPXC050Y NOW33331999No Way Out.
TJAT1N999Y NOW33331999still loving you.
TJBJOG575Y NOW33331999Jacka nd jill.
TJBJXG575Y NOW33331999Julie and friend
I am trying to get the output something like this-
Yo You Baby|TH2
sure thing alas|TJ2
No Way Out.|TJA
Jacka nd jill|TJB
Here..TH2,TJ2,TJA and TJB are the distinct first 3 characters from the input.
In the input , lets say fr=substr($0,1,3) and nx=substr($0,4,3).
Basically, i want to check the line if the first 3 character(fr) = the next 3 characters(nx),
then print substr($0,23,20) and the substr($0,1,3)
If they dont match, then print the first occurance of the fr with its associated substr($0,23,20).
I started doing domething like this..
awk 'BEGIN{OFS="|"}{fr=substr($0,1,3);nx=substr($0,4,3); if (fr == nx) print substr($0,23,20),fr}' inputfile
| nawk 'BEGIN{FS="|";OFS="|"}{ sub(/[ \t]*$/, "",$1);print $1,$2}'
But this will missed out to print lines when fr and nx dont match
in my above example - fr doesn't match with fr..
TJAPXC050Y NOW33331999No Way Out.
TJAT1N999Y NOW33331999still loving you.
TJBJOG575Y NOW33331999Jacka nd jill.
TJBJXG575Y NOW33331999Julie and friend
But I would like to get the result as below too...( the first occurance of the fr and its substr )
No Way Out.|TJA
Jacka nd jill|TJB
Help!
Regards,
Big Gun