Solved Substitution by ID

October 29, 2011 at 18:51:37
Specs: UNIX
Hello,

rawfile:
DATE,ID,RecordA,RecordB,Name
"01/15/2002","AA00CA",9.444,9.456,"PACKNGO"
"01/15/2002","AB00AA",0.001,0.006,"MILK"
"01/15/2002","AP00CB",1.234,1.123,"FRL"
"01/15/2002","BA83CA",45.444,45.55,""
"01/15/2002","CA11VT",-9,-9,""
"01/15/2002","ZQ12QT",10,10,""

updatelist:
"AP00CB" -20.0
"CA11VT" 9.999

Output:
DATE,ID,RecordA,RecordB,Name
"01/15/2002","AA00CA",9.444,9.456,"PACKNGO"
"01/15/2002","AB00AA",0.001,0.006,"MILK"
"01/15/2002","AP00CB",-20.0,-20.0,"FRL"
"01/15/2002","BA83CA",45.444,45.55,""
"01/15/2002","CA11VT",9.999,9.999,""
"01/15/2002","ZQ12QT",10,10,""

I would like to update the rawfile from the field 2 of the updatelist by matching their IDs. If IDs of both files match, both RecordA and RecordB of rawfile will be update by the value of field2 of updatelist file. Is it possible to accomplish this by UNIX functions: nawk, sed, etc. I'm hoping to see if UNIX could able to process this well.

Thank you in advance!


See More: Substitution by ID

Report •

#1
October 30, 2011 at 15:19:42
✔ Best Answer
Since nawk does not allow changing the Field Seperator once it is set, I change the updatelist FS to a comma from a space:

#!/bin/ksh

# change the upatelist field seperator to a comma 
sed 's/ /,/g' updatelist > newupdatelist
nawk ' BEGIN { FS=",";
   while ( getline < "newupdatelist" > 0 )
      list[$1]=$2
}
{

if($2 in list)
   $3=list[$2]

print $0
} ' rawfile > newrawfile


Report •

#2
October 30, 2011 at 17:10:31
Hi Nail,

Thank you very much for your code; it give me a great idea of the beauty of using UNIX!

I tried to implement the code. I putted them in text file named "update_script", and then in the command line, I typed:

source update_script

but I got the followings error:

unmatched '.


Report •

#3
October 30, 2011 at 21:46:18
Since you are using the source keyword, you must be using the "C" shell, csh or tcsh.

What I posted was a korn shell script, ksh. You cannot source a ksh script from csh, but you can execute a ksh script if your parent shell is csh or tcsh.

Suppose your script is called myscript.sh. First, change the script to be executable:

chmod 755 myscript.sh

Then, execute it:

myscript.sh

or:

./myscript.sh


Report •

Related Solutions

#4
October 31, 2011 at 17:03:37
Thanks again for your reply! My ksh was at another location.

I got it working but the following unexpected outcome:

DATE,ID,RecordA,RecordB,Name
"01/15/2002","AA00CA",9.444,9.456,"PACKNGO"
"01/15/2002","AB00AA",0.001,0.006,"MILK"
1.123 "FRL" "AP00CB" -20.0
"01/15/2002","BA83CA",45.444,45.55,""
"01/15/2002" "CA11VT" 9.999 -9 ""
"01/15/2002","ZQ12QT",10,10,""

I don't know why. I re-read the code several times. It seems logical to me. Could you please help me out with that?


Report •

#5
November 1, 2011 at 09:11:11
First, I apologize; I missed the requirement that field 3 AND field 4 needed to be changed so the if statement should read as such:

if($2 in list)
   {
   $3=list[$2]
   $4=list[$2]
   }

Second, I think the problem is with the updatelist file. You moved it over from Window's, correct? In Windows/DOS, lines are terminated with a Carriage Return/Line Feed combination. In Unix/Linux, lines are terminated only with a Line Feed.

On your Unix box, if you vi the updatelist file, you should see something like this:

"AP00CB" -20.0^M
"CA11VT" 9.999

Deleting the Control-M at the end of the line should fix the problem.

Read up on the dos2unix command for more information.


Report •

#6
November 1, 2011 at 17:14:02
Hi Nails,

Yes. You are right: it was the Carriage Return that causes the problem. Your code works smoothly! It is very kind of you providing me this code. I picked up a lot from this forum. Thank you very much again for your great help and immediate response!


Report •

Ask Question