Computing.Net > Forums > Unix > Read tow files with AWK

Read tow files with AWK

Reply to Message Icon

Original Message
Name: abd73fr
Date: December 17, 2004 at 09:35:35 Pacific
Subject: Read tow files with AWK
OS: Macintosh
CPU/Ram: PowerMac G4
Comment:

Hi,

I have two files:

tst1
1 1 1 2 2 2 2 2
2 2 2 2 2 2 2 2
0 1 1 1 1 1 1 1

tst2
5 6 5 8 8 9 8 8
8 8 9 8 8 8 8 9
0 5 6 5 5 6 5 5

The two files have the same dimensions (i.e. same
NF&NR).
I want to read the file (tst1) and get the coordinates (x,y)
or (NF, NR in AWK) of the different entries in it.
Then get the sum of the elements in tst2 whose the same
(NF, NR) in tst1 for one entry.

More clearly:
in the file tst1 and for the number (1), I can print NF & NR
as follows:

x=`awk '{for (i=0; i<=NF; i++) {if ($i==1) printf (i" " ) }}'
tst1`
y=`awk '{for (i=0; i<=NF; i++) {if ($i==1) printf (NR" ") }}'
tst1`

==>

NFs: 1 2 3 2 3 4 5 6 7 8
NRs: 1 1 1 3 3 3 3 3 3 3

In the file tst2, I want to sum the entries whose the same
(x,y) as for the number 1 in tst1.

For this exampel, I expect:

sum (for 1 in tst1) = (5+6+5+5+6+5+5+6+5+5)= 53
sum (for 2 in tst1) =
(8+8+9+8+8+8+8+9+8+8+8+8+9)= 107

I wish it is clear and feaisble :)

Thanks


Report Offensive Message For Removal


Response Number 1
Name: Jim Boothe
Date: December 17, 2004 at 12:12:20 Pacific
Reply: (edit)

F1=tst1
F2=tst2

if [ $(wc -l < $F1) -ne $(wc -l < $F2) ] ;
then
   echo 'Files must have same # lines'
   exit 1
fi

awk -v F1=$F1 '{
if (FILENAME==F1)
   {t1line[NR] = $0
    next}
else
   if (nbrrecs==0)
      nbrrecs=NR-1

origNR=NR-nbrrecs

split(t1line[origNR],t1words)

for (i=1;i<=NF;i++)
   {t1word=t1words[i]
   totals[t1word]=totals[t1word]+$i}
}

END {
for (i in totals)
    print i, totals[i]
}' $F1 $F2


Report Offensive Follow Up For Removal

Response Number 2
Name: Jim Boothe
Date: December 17, 2004 at 12:30:41 Pacific
Reply: (edit)

I like this version better.  I store file1 into the array in the BEGIN statement (I have to create my own NR at this stage).  Then the processing of file2 can utilize NR directly instead of an adjusted NR.

F1=tst1
F2=tst2

if [ $(wc -l < $F1) -ne $(wc -l < $F2) ] ;
then
   echo 'Files must have same # lines'
   exit 1
fi

awk -v F1=$F1 '\
BEGIN {
while ((getline < F1) > 0)
   {nr++
    t1line[nr] = $0}
}

{split(t1line[NR],t1words)

 for (i=1;i<=NF;i++)
    {t1word=t1words[i]
    totals[t1word]=totals[t1word]+$i}
}

END {
for (i in totals)
    print i, totals[i]
}' $F2


Report Offensive Follow Up For Removal

Response Number 3
Name: abd73fr
Date: December 20, 2004 at 07:22:03 Pacific
Reply: (edit)

It is superb...

Thanks a lot Jim... :)

In fact, the files tst1 & tst2 are just test files. But in reality,
I have two files each one has 22200 lines and 146 fields
and your program will help me a lot.. :)

But I have one more question please:
To verify the sum, how can I print the values?
i.e. for my examples:
for (1) in tst1, if I'd like to print :
5
6
5
5
6
5
5
6
5
5

what I have to do?

Thanks again :))


Report Offensive Follow Up For Removal

Response Number 4
Name: Jim Boothe
Date: December 20, 2004 at 10:38:10 Pacific
Reply: (edit)

The green lines will display all values accumulated for the value being checked for:

F1=tst1
F2=tst2

if [ $(wc -l < $F1) -ne $(wc -l < $F2) ] ;
then
   echo 'Files must have same # lines'
   exit 1
fi

awk -v F1=$F1 '\
BEGIN {
while ((getline < F1) > 0)
   {nr++
    t1line[nr] = $0}
}

{split(t1line[NR],t1words)
 hdrprinted=0
 for (i=1;i<=NF;i++)
    {t1word=t1words[i]
     totals[t1word]=totals[t1word]+$i
     if (t1word==1)
        if (hdrprinted==0)
            {printf "Line%3d: word%3d: %s\n",NR,i,$i
             hdrprinted=1}
         else
             printf "%13s%3d: %s\n"," ",i,$i }
}

END {
print "Summary totals:"
for (i in totals)
    print i, totals[i]
}' $F2

./sumit.sh
Line  1: word  1: 5
        &nbp      2: 6
        &nbp      3: 5
Line  3: word  2: 5
        &nbp      3: 6
        &nbp      4: 5
        &nbp      5: 5
        &nbp      6: 6
        &nbp      7: 5
        &nbp      8: 5
Summary totals:
2 107
1 53


Report Offensive Follow Up For Removal

Response Number 5
Name: Jim Boothe
Date: December 20, 2004 at 10:46:57 Pacific
Reply: (edit)

Had a little posting error with the spaces - will try again ...

The green lines will display all values accumulated for the value being checked for:

F1=tst1
F2=tst2

if [ $(wc -l < $F1) -ne $(wc -l < $F2) ] ;
then
   echo 'Files must have same # lines'
   exit 1
fi

awk -v F1=$F1 '\
BEGIN {
while ((getline < F1) > 0)
   {nr++
    t1line[nr] = $0}
}

{split(t1line[NR],t1words)
 hdrprinted=0
 for (i=1;i<=NF;i++)
    {t1word=t1words[i]
     totals[t1word]=totals[t1word]+$i
     if (t1word==1)
        if (hdrprinted==0)
            {printf "Line%3d: word%3d: %s\n",NR,i,$i
             hdrprinted=1}
         else
             printf "%13s%3d: %s\n"," ",i,$i }
}

END {
print "Summary totals:"
for (i in totals)
    print i, totals[i]
}' $F2

./sumit.sh
Line  1: word  1: 5
                      2: 6
                      3: 5
Line  3: word  2: 5
                      3: 6
                      4: 5
                      5: 5
                      6: 6
                      7: 5
                      8: 5
Summary totals:
2 107
1 53


Report Offensive Follow Up For Removal


Response Number 6
Name: abd73fr
Date: December 21, 2004 at 08:12:34 Pacific
Reply: (edit)

Very nice :)

Thank you very much Jim... your codes work fine ...

I wish you a very Merry Christmas and Happy New Year :)

@+


Report Offensive Follow Up For Removal

Response Number 7
Name: Jim Boothe
Date: December 21, 2004 at 10:46:33 Pacific
Reply: (edit)

Thanks - Merry Christmas and Happy New Year to you also!


Report Offensive Follow Up For Removal






Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Read tow files with AWK

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 4 Days.
Discuss in The Lounge