Solved combine two files

December 4, 2011 at 22:09:45
Specs: linux, Pentium
file1 :
Class /Src/com/gzh/ge/JHelloWorld.java/com::gzh::ge::JHelloWorld GEN_012

Class /Src/HelloWorld.h/HelloWorld GEN_002

function /Src/HelloWorld.cpp/HelloWorld::memberFunc1 GEN_002

function /Src/com/gzh/ge/JHelloWorld.java/com::gzh::ge::JHelloWorld::JmemberFunc1 GEN_015

function /Src/HelloWorld.cpp/HelloWorld::memberFunc2 GEN_005

function /Src/com/gzh/ge/JHelloWorld.java/com::gzh::ge::JHelloWorld::JmemberFunc2 GEN_016

function /Src/HelloWorld.cpp/HelloWorld::memberFunc3 GEN_006

file2:
function /Src/CorrectedImageManagement.cpp/CorrectedImageManagement::CorrectedImageManagement RAC
function /Src/CorrectedImageManagement.cpp/CorrectedImageManagement::getInstance RAC
function /Src/HelloWorld.cpp/HelloWorld::memberFunc1 RAC
function /Src/HelloWorld.cpp/HelloWorld::memberFunc3 RAC


I want to print all the data of file1,and if the first two columnes of the two files is indentical,remained the first two

columnes of one file,and append the third column of another file to last file. if some line are in file2 ,but not in

file1 ,also print them out.


the output should be like this:


Class /Src/com/gzh/ge/JHelloWorld.java/com::gzh::ge::JHelloWorld GEN_012
Class /Src/HelloWorld.h/HelloWorld GEN_002
function /Src/HelloWorld.cpp/HelloWorld::memberFunc1 GEN_002 RAC
function /Src/com/gzh/ge/JHelloWorld.java/com::gzh::ge::JHelloWorld::JmemberFunc1 GEN_015
function /Src/HelloWorld.cpp/HelloWorld::memberFunc2 GEN_005
function /Src/com/gzh/ge/JHelloWorld.java/com::gzh::ge::JHelloWorld::JmemberFunc2 GEN_016
function /Src/HelloWorld.cpp/HelloWorld::memberFunc3 GEN_006 RAC

not related function or class is as below:
function /Src/CorrectedImageManagement.cpp/CorrectedImageManagement::CorrectedImageManagement RAC
function /Src/CorrectedImageManagement.cpp/CorrectedImageManagement::getInstance RAC


i am newer to awk .Can you help me to solve the problem .Thanks in advance


See More: combine two files

Report •

✔ Best Answer
December 7, 2011 at 08:32:03
The only way I can see to do it is to create a parallel array, delarr. When an element is printed from myarr, delete the same element from delarr. The END section executes when the awk program completes printing out anything left in the delarr array:

awk ' BEGIN {
   while ( getline < "file2" > 0 )
      {
      ind=$1" "$2
      myarr[ind]=$3
      delarr[ind]=$3
      }
}
{
# skip blank lines
if (NF == 0)
   next

ind=$1" "$2
if (ind in myarr)
   {
   printf("%s %s\n", $0,  myarr[ind])
   if (ind in delarr)
      delete delarr[ind]
   }
else
   print $0

}
END { # print any elements left

for (ind in delarr)
   printf("%s %s\n", ind,  delarr[ind])
}
' file1



#1
December 5, 2011 at 09:23:14
I would try to help you if I understood what you wanted. Your explanation is not clear.

Also, what is your column delimiter?


Report •

#2
December 5, 2011 at 17:01:44
OK.Thanks! It is blank. Certainly you can modify it with delimiter comma


my requirement is :

First, I want to print all the data of file1,
Second, when print the line of file1, if the first two columnes($1,$2 ) of the two files(file1 ,file2 ) is indentical, append the third column of file2
Third ,if the first two columnes of file2 can't be found in file1,also print them out.


Report •

#3
December 6, 2011 at 11:23:11
This works because file1 and file2 have 3 fields each. If that changes, this script breaks:

awk ' BEGIN {
   while ( getline < "file2" > 0 )
      {
      ind=$1" "$2  # index into the array is field 1, followed by a space, and field 2
      myarr[ind]=$3
      }
}
{
# skip blank lines
if (NF == 0)
   next

ind=$1" "$2
if (ind in myarr)
   printf("%s %s\n", $0,  myarr[ind])
else
   print $0

} ' file1


Report •

Related Solutions

#4
December 6, 2011 at 17:44:13
Yes.Thanks a lot.

But it works for the first two requirement. The third requirement is not valid. I am also confused the third requirement too .That is "not related function or class is as below:
function /Src/CorrectedImageManagement.cpp/CorrectedImageManagement::CorrectedImageManagement RAC
function /Src/CorrectedImageManagement.cpp/CorrectedImageManagement::getInstance RAC " also should be printed out too .


Report •

#5
December 7, 2011 at 08:32:03
✔ Best Answer
The only way I can see to do it is to create a parallel array, delarr. When an element is printed from myarr, delete the same element from delarr. The END section executes when the awk program completes printing out anything left in the delarr array:

awk ' BEGIN {
   while ( getline < "file2" > 0 )
      {
      ind=$1" "$2
      myarr[ind]=$3
      delarr[ind]=$3
      }
}
{
# skip blank lines
if (NF == 0)
   next

ind=$1" "$2
if (ind in myarr)
   {
   printf("%s %s\n", $0,  myarr[ind])
   if (ind in delarr)
      delete delarr[ind]
   }
else
   print $0

}
END { # print any elements left

for (ind in delarr)
   printf("%s %s\n", ind,  delarr[ind])
}
' file1


Report •

#6
December 7, 2011 at 18:40:16
It is OK. Thank you for your great help !!

Report •

#7
December 8, 2011 at 19:04:23
It is the best solution. Thanks!

Report •

Ask Question