awk scripting using 2 input files

February 18, 2009 at 18:36:06
Specs: unix
Does anyone know how to read a pattern in from a file file1 and then search for this pattern in another file file2. The output should be equal to the line in file2 that has the pattern from file1. File1 and file2 will both have different number of lines in them.

file1
abcd 1
bcde 2
abcd 3

file2
abcd 2 aa bb cc
bcde 2 bb cc dd

output
bcde 2 bb cc dd

any assistance would be greatly appreciated,

Thanks


See More: awk scripting using 2 input files

Report •


#1
February 19, 2009 at 10:15:02
Two things:

1) I'm using solaris, so I'm using nawk.

2) Since I'm reading file2.txt into an array, if file2.txt is large, you could have a problem with performance.

nawk ' BEGIN { cnt=0;
   while ( getline line < "file2.txt" > 0 )
      n[++cnt]=line
}
{
for(i=1; i<=cnt; i++)
   if(match(n[i],$0) > 0)
      print n[i]

} ' file1.txt


Report •

#2
February 20, 2009 at 07:49:54
Thanks for the code that you have provided. It appears to be partially working but it is also return incorrect information.

To give a little more background, file1 has 1193 records in the file and file2 has 29300 records. I am trying to use the first two columns of file1 to index into file2. The code that was provided appears to work for the first 212 records in file1 but after that something happens and incorrect data starts to periodically be written.

I was able to get another awk script to run, but it is very inefficient as it does not use an array and it opens and closes both files numerous time (also takes and hour to run). Below is an excerpt of an sdiff that was done between a known good file and the output of the code that was provided. The output of the script is on the left hand side while the good data is on the right.

sdiff
NRFLVAJTDS002B1B 12 0 MGW 119 28 NRFLVAJTDS002B1B 12 0 MGW 119 28
NRFLVAJTDS002B1B 120 0 MGW 104 17 <
NRFLVAJTDS002B1B 121 0 MGW 104 17 <
NRFLVAJTDS002B1B 122 0 MGW 104 17 <
NRFLVAJTDS002B1B 123 0 MGW 104 17 <
NRFLVAJTDS002B1B 124 0 MGW 104 17 <
NRFLVAJTDS002B1B 125 0 MGW 127 29 <
NRFLVAJTDS002B1B 126 0 MGW 127 29 <
NRFLVAJTDS002B1B 127 0 MGW 127 29 <
NRFLVAJTDS002B1B 128 0 MGW 127 29 <
NRFLVAJTDS002B1B 129 0 MGW 127 29 <
NRFLVAJTDS002B1B 22 0 MGW 119 28 NRFLVAJTDS002B1B 22 0 MGW 119 28

Any information that you could provided for why this script that was provided is returning false data would be greatly appreciated.

Thanks



Report •

Related Solutions


Ask Question