AWK compare 2 file and generate multiple file

December 27, 2010 at 05:28:58
Specs: HP UX
Hi, I have a problem using awk, already used for simple task but find hard to complete this.

I have 2 file.
First is like

01 696008326
01 696008327
02 550310242
02 550310258
03 550310283
03 550310306

the second is like

TSTAPH201465001000000077T7550V 00000051S696008326
TSTAPH125485001000000081T7550V 00000051S696008327
TSTAPH125487001000000081T7540V 00000051S550310242
TSTAPH126506001000000081T7550V 00000051S550310258
TSTAPH126511001000000081T7550V 00000051S550310283
TSTAPH126557001000000081T7550V 00000051S550310306
TSTAPH126557001000000081T7550V 00000051S550310316


I "simply" need to create new files with format identical to second file but renamed:

myfile01 contains only records
TSTAPH201465001000000077T7550V 00000051S696008326
TSTAPH125485001000000081T7550V 00000051S696008327

myfile02 only
TSTAPH125487001000000081T7540V 00000051S550310242
TSTAPH126506001000000081T7550V 00000051S550310258

and so on of 03
TSTAPH126511001000000081T7550V 00000051S550310283
TSTAPH126557001000000081T7550V 00000051S550310306

and a myfileXX
with records which key are in second file but not in first. like
TSTAPH126557001000000081T7550V 00000051S550310316

How is this possbile?


See More: AWK compare 2 file and generate multiple file

Report •

#1
December 27, 2010 at 15:27:11
This looks like homework so I will get you started. Read the first file into an array, s in this case, Then, read the second file and check if the last field is matched by an element in the array.

Here is the stub that reads the first file and I'll leave the rest to you:

awk ' BEGIN {
   while ( getline < "file1" > 0 )
      s[$2]=$2
}
{
# the rest of the program goes here

} ' file2

Here are two more hints:

As you read file2, the the last field of each line read is $NF, where NF is the number of fields in the line:

You print to a file using >>. This stub creates filename myfile01 and prints string "str" to it:

i=0
i++
nf="myfile0"i
print "str" >> nf


Report •

#2
December 27, 2010 at 15:35:59
Is not an homework, it is a real case. I have many db instances one contain all data and then I have an instance for 01, 02, 03 and so on.
So I have to split customer file in many files to process on single instance based on a query on main instance.
It seems easy because I simplified records layout. second file have many fields (up to 900chars) and both file 1 and 2 have more than 7000rows.
So key is not "last record".
Thank you very much, I'll try to build this solution when at work. :) I was giving up and working to made it with many greps and file editing

Report •

#3
December 27, 2010 at 17:07:04
Solution based on your data .....

awk ' BEGIN { ind=1; cnt=0; filename="myfile1"
   while ( getline < "file1" > 0 )
      s[$2]=$2
}
{
nomatch=0
for(i in s)
   if(match($NF, s[i]) > 0)
      {  # print the line if a match found in the array
      nomatch=1
      print $0 >> filename
      cnt++
      # change file name after printing two lines
      if(cnt == 2)
         {
         ind++
         filename="myfile"ind
         cnt=0
         }
      }

if(nomatch == 0)
    print $0 >> "myfileXX"

} ' file2


Report •
Related Solutions


Ask Question