Trouble merging 2 files (+math) in AWK

August 23, 2010 at 12:30:26
Specs: Windows XP
Hi All,

I have two files (file 1 and file 2) that I need to merge and do some simple operations on, but the results are not what is desired. The first file is of the form: Columns data1, data2, data3; with x number of rows (variable). File 2 consists of x number of rows with 1 column. The value in this column is the final value in data 3 at row x. The idea is to then subtract this final value from all values in column data 3 and output it in a new appended data structure.

However, when I use the command structure below, the result is only to subtract from the final row, not all rows as intended (example below).

awk 'BEGIN { while (getline < "file2" > 0) f2data[$1] = $1 } { n[$1]=(f2data[$2]-$2) } {print $0 " " n[$1]}' file1

and get the following results -

2003 4.746 -4.746
2004 4.473 -4.473
2005 4.929 -4.929
2006 4.612 -4.612
2007 4.527 -4.527
2008 5.192 -5.192
2009 4.853 -4.853
2010 4.571 0


Any help would be appreciated.

Thanks,

~Matt


See More: Trouble merging 2 files (+math) in AWK

Report •

#1
August 25, 2010 at 19:29:21
you want us to guess how your input files look like and then guess how it becomes the output that your shown?

GNU win32 packages | Gawk


Report •

#2
August 25, 2010 at 21:16:53
Well, no that would be in the two lines that state: file(y) consists of...

To be more specific to this example:

file 1:

2003 4.746
2004 4.473
2005 4.929
2006 4.612
2007 4.527
2008 5.192
2009 4.853
2010 4.571

file 2 (same number of rows as file1):
4.571
4.571
4.571
4.571
4.571
4.571
4.571
4.571

Hopefully that more clear.


Report •

#3
August 25, 2010 at 21:55:25
awk '{getline line <"file2"; split(line,a," "); $2=$2-a[2]; }{print}' file1

GNU win32 packages | Gawk


Report •

Related Solutions

#4
August 25, 2010 at 23:10:13
awk ' BEGIN { x=1;
while (getline < "file2" > 0)
   {
   f2data[x] = $1
   x++
   }
}
{
printf("%s %s\n", $0, f2data[NR]-$2)
}' file1


Report •

#5
August 26, 2010 at 20:50:18
Thank you for the responses.

Ghostdog's code doesn't seem to work quite right. It just displays the data from file2.

The code from nails works, sort of. As can be seen from the results below, the discrepancies occur for the first and last data points. The 2010 record should be 0 by definition (4.571-4.571=0) and the 2003 record should not be simply the negative of itself (4.746-4.571=0.175). It seems in both these cases, the final value (4.571) isn't being accurately subtracted.

Any thoughts?

2003 4.746 -4.746
2004 4.473 0.0981
2005 4.929 -0.3578
2006 4.612 -0.0407
2007 4.527 0.0444
2008 5.192 -0.6205
2009 4.853 -0.2814
2010 4.571 0.0007


Report •

#6
August 26, 2010 at 22:49:29
I don't understand why the first line isn't reading:4.746-4.571=0.175. It is for me. What version of awk are you using? I am using nawk for Solaris 9.

Try using printf's float specifier; that might help:

printf("%s %.4f\n", $0, f2data[NR]-$2)



Report •

#7
August 27, 2010 at 19:43:24
Using printf's float specifier is a no go. Same output is generated.

version: GNU Awk 3.1.6a

Yes, it is very frustrating. Something is off somewhere and I have no access to fix it. It's being run on a linux (I think, but it may be a sun - 8 core) box at my old university some 2000 miles away. I've been in contact with a few people who work on this data nearly daily, and they don't have these issues. I'll run their code on my account and it fails. They run their code on their account and it works. I've spent more time coming up with work arounds for something simple that just refuses to work (AWK), than I do actually working on the data. Apparently this is run all by faculty and there is only one unix administrator on campus, and they don't talk. So, I'm expected to figure it out.


Report •

Ask Question