Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Can anyone point me in the right direction to compare two files and combine them using awk.
Example.
File1 Data
2442,Sep,16,2006,ACTIVE
2068,Aug,17,2006,ACTIVE
2245,Sep,20,2006,ACTIVE
2044,Jun,29,2006,INACTIVEFile2 Data
a024831,2068
a636090,2069
a639911,2044What I want is to compare the two files and where I see 2068 in file1 and 2068 in file2 create file3 so it looks like this:
a024831,Aug,17,2006,ACTIVE
I hope I haven't confused anyone...HELP!

At start up, awk loads file2 in an array so that it can reference it while processing each line in file1.
awk -F, 'BEGIN {
while ((getline < "file2") > 0)
f2array[$2] = $1
OFS=","}
{if (f2array[$1])
print f2array[$1],$2,$3,$4,$5
#else
# print $1 " not listed in file2" > "unmatched"
}' file1You can uncomment the else condition if you want the no matches to be written to a file named unmatched. If this were a shell script doing line by line processing, this would need >> instead of > for appending multiple lines, but awk keeps each output file open until explicitly closed or end of program, so you just need the single > (unless you want to append unmatched lines across multiple awk runs).

I am reposting with the spacing preserved for easier reading.
awk -F, 'BEGIN {
while ((getline < "file2") > 0)
f2array[$2] = $1
OFS=","}{if (f2array[$1])
print f2array[$1],$2,$3,$4,$5
#else
# print $1 " not listed in file2" > "unmatched"
}' file1

Its not displaying any data in the new file I'm creating...here is how I have the script setup...
awk -F, 'BEGIN {
while ((getline < $DATA2) > 0)
f2array[$2] = $1
OFS=","}{if (f2array[$1])
print f2array[$1],$2,$3,$4,$5
}' $DATA1

Inside of the single-quoted awk program, shell variables do not get evaluated. There are a handful of ways to process shell variables with awk, and I post two solutions below. The first solution plays tricks with single quotes, and the second solution passes the variable on the command line. data2 and DATA2 can be the same, but I show them in different case for clarity of the situation.
DATA1=file1
DATA2=file2awk -F, 'BEGIN {
while ((getline < "'$DATA2'") > 0)
f2array[$2] = $1
OFS=","}{if (f2array[$1])
print f2array[$1],$2,$3,$4,$5
}' $DATA1DATA1=file1
DATA2=file2awk -F, -v data2=$DATA2 'BEGIN {
while ((getline < data2) > 0)
f2array[$2] = $1
OFS=","}{if (f2array[$1])
print f2array[$1],$2,$3,$4,$5
}' $DATA1

How about this in python
f2 = open("file2.txt").readlines()
number = [] #store numbers
for items in f2:
... number.append(items.split(",")[-1].strip() )
for lines in open("file1.txt"):
.......for n in number:
..............if n in lines:
....................print lines

Or if you can sort them first,
$ join -t, -1 1 -2 2 file1 file2
2044,Jun,29,2006,INACTIVE,a639911
2068,Aug,17,2006,ACTIVE,024831

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |