Computing.Net > Forums > Unix > trouble merging two files with awk

trouble merging two files with awk

Reply to Message Icon

Original Message
Name: pda976
Date: October 14, 2007 at 18:01:10 Pacific
Subject: trouble merging two files with awk
OS: OSF1
CPU/Ram: alpha 1Gb ram
Model/Manufacturer: compaq
Comment:

Hi All,

I've seen many questions/answers about combining or merging two files using awk but I can't quite get them to work for my situation. It's possible there is an answer out there but just haven't stumbled upon it yet. My understanding of awk array's is letting me down.

Firstly i'm needing an awk solution. sort, join or paste will not work. The two files may not have the same key records.

I have two files. for ease file1 and file2.
I need to combine the records $2 and $3 in file2 to the end of the line of file1 where $1 in file2 is equal to $2 in file1

If a record exists in file1 but not file2 then I still need to output seperators so the field length is consistent.

They are not sorted although this can be done however as previously mentioned there can be instances in file1 where a record does not exist in file2.

There will be around 1 million records.

file1
27896370|10411223311|BLABLABLA 0411223311|27896370|1||3|
27896381|10411223322|BLABLABLA 0411223322|27896381|1||3|
64979764|10311223333|BLABLABLA|64979764|2||3|

file2
10411223311|0|2|
10311223333|0|1|
10411223322|1|0|

expected file3 output
27896370|10411223311|BLABLABLA 0411223311|27896370|1||3|0|2|
27896381|10411223322|BLABLABLA 0411223322|27896381|1||3|1|0|
64979764|10311223333|BLABLABLA|64979764|2||3|0|1|


Thanks in advance to anyone who can assist.
Cheers


Report Offensive Message For Removal

Response Number 1
Name: James Boothe
Date: October 16, 2007 at 08:23:01 Pacific
Subject: trouble merging two files with awk
Reply: (edit)


awk -F\| 'BEGIN {
while ((getline < "file2") > 0)
   f2data[$1] = $2 "|" $3 }
{c2=f2data[$2]
 if (c2=="") c2="|"
 print $0 c2 "|"
}' file1

And here it is coded a slightly different way.  In the above solution, I pulled the array entry into a variable, then checked my variable to see if it was null.  In the below solution, I check to see if the array entry exists, and if so, pull that entry.  I would be surprised if you can see any difference in run time between the two versions.

awk -F\| 'BEGIN {
while ((getline < "file2") > 0)
   f2data[$1] = $2 "|" $3 }
{if ($2 in f2data)
    c2=f2data[$2]
 else
    c2="|"
 print $0 c2 "|"
}' file1


Report Offensive Follow Up For Removal

Response Number 2
Name: pda976
Date: October 16, 2007 at 23:02:04 Pacific
Subject: trouble merging two files with awk
Reply: (edit)

Thanks heaps James. I was trying things similar but couldn't quite crack it.


Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: trouble merging two files with awk

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software