Solved How sum a column based in another column

December 1, 2015 at 04:21:40
Specs: Linux Red Hat, T5240
Hi guys,

I have a file with the following output:
[telorb@CCNPE1-io2 CcnCounters]> /tmp/a.txt
a 1
a 2
a 4
b 10
b 12
c 15
c 1

I would like to sum the column $2 of all same column

Example, the output i expected, in this case, is:
a 7
b 22
c 16

Does anyone know how do that ? Is possible do it in awk/sed ?


See More: How sum a column based in another column

Report •

#1
December 1, 2015 at 08:29:17
✔ Best Answer
Use an awk associative array:

awk ' {
a[$1]=a[$1] + $2
} END {
for(i in a)
   print i, "",  a[i] } ' /tmp/a.txt

Let me know if you have any questions.

message edited by nails


Report •

#2
December 1, 2015 at 12:44:54
Hello Nails !!!!!!!! :)

Your solution worked ... but i dont understood ... could you explain me your awk ?
For example, where it compare the value of the lines of column $1 and sum the column $2? Is it this entry:
Compare the lines: "a[$1]=a[$1]"
If lines is equal, sum column $2: a[$1]=a[$1] + $2

Is correct ?

Other doubt:
I have two files (not only one).
How can i know the average ?

For example: If a have:
[telorb@CCNPE1-io2 CcnCounters]> /tmp/a.txt
a 1
a 2
a 4
b 10
b 12
c 15
c 1

[telorb@CCNPE1-io2 CcnCounters]> /tmp/aa.txt
a 1
a 2
a 4
b 10
b 12
c 15
c 1

instead output
a 14
b 44
c 32

the output I expected is:
a 7
b 22
c 16


Report •

#3
December 1, 2015 at 14:12:13
An associative array allows using an alpha character as an index. So this means something like this:

a[$1]=a[$1] + $2

means adding the second field to the contents of a[$1].

in this example, awk parses the line into two fields $1 and $2. So, the first line example field $1 = a and $2 = 1 which means

a[a] = a[a] +1

The next line $1 = a and $2 = 2 which means

a[a] = a[a] + 2

and so on.

Regarding your second question use something like this:

awk ' {
if(myfile != FILENAME) # count the number of files
   cnt++
myfile=FILENAME
a[$1]=a[$1] + $2
} END {
for(i in a)
   print i, "",  a[i]/cnt
   } ' /tmp/a.txt /tmp/aa.txt

I take it by average you mean to divide by the number of files - 2 in this case. The awk internal variable is FILENAME. When the FILENAME variable changes, add 1 to the number of files.

Then, when printing out each array element, divide by the number of files, cnt.


message edited by nails


Report •

Related Solutions

#4
December 4, 2015 at 07:31:54
Nails, perfect !!

I was these days in a travel, sorry for dont answer before.

Your solution works for me ... i just have more one doubt:

If I need do:

cat /tmp/a*.txt | awk '{if(myfile != FILENAME) cnt++;myfile=FILENAME;a[$1]=a[$1] + $2} END {for(i in a) print i, "", a[i]/cnt }' 

It will not work, of course
and I cant do:

awk '{if(myfile != FILENAME) cnt++;myfile=FILENAME;a[$1]=a[$1] + $2} END {for(i in a) print i, "", a[i]/cnt }'  /tmp/a*.txt

what is the solution to, using awk, get the measure ?

I am asking that because my original command is (followed by its output):

[telorb@CCNPE1-io2 CcnCounters]> cat A20150619.01* | grep -A1 -i system | xargs | sed -e 's/<\/r>/<\/r>\n/g' -e 's/ -- <moid>CcnCounters=//g' -e 's/, Source = _SYSTEM<\/moid> </: /g' -e 's/<\/r>//g' -e 's/r>//g' -e 's/xml>/xml>\n/g' | sort | grep -v CTYPE
RelaySms-FinalReport-Successful: 44509
RelaySms-FinalReport-Successful: 44691
RelaySms-FinalReport-Successful: 45063
RelaySms-FinalReport-Successful: 46285
RelaySms-FinalReport-Successful: 46415
RelaySms-FinalReport-Successful: 46830
RelaySms-FinalReport-Successful: 47820
RelaySms-FirstInterrogation-InternalError-Rejected: 182
RelaySms-FirstInterrogation-InternalError-Rejected: 188
RelaySms-FirstInterrogation-InternalError-Rejected: 189
RelaySms-FirstInterrogation-InternalError-Rejected: 204
RelaySms-FirstInterrogation-InternalError-Rejected: 216
RelaySms-FirstInterrogation-InternalError-Rejected: 217
RelaySms-FirstInterrogation-InternalError-Rejected: 253
RelaySms-FirstInterrogation-InternalError-Rejected: 258
RelaySms-FirstInterrogation-InternalError-Rejected: 271
RelaySms-FirstInterrogation-InternalError-Rejected: 299
RelaySms-FirstInterrogation-InternalError-Rejected: 309
RelaySms-FirstInterrogation-InternalError-Rejected: 409
RelaySms-FirstInterrogation-Successful: 48936
RelaySms-FirstInterrogation-Successful: 50860
RelaySms-FirstInterrogation-Successful: 51679


and i need put your awk after this command above to know the meausere for each group.


Thanks for help my friend !

message edited by surfistadesampa


Report •

#5
December 8, 2015 at 08:23:01
awk doesn't work well with a variable number of input files. The only way I see, is to build a dynamic awk program using a Unix here document. This builds the awk program and includes the variable file names. Within the HERE document, I try to verify this puts all the file names on one line:

$(ls /tmp/a*.txt|xargs)

See this link for more info on HERE documents:

https://en.wikipedia.org/wiki/Here_...


#!/bin/ksh

# use a Unix HERE document to create a new program
cat << EOF > new.ss
#!/bin/ksh

awk ' {
if(myfile != FILENAME) # count the number of files
   cnt++
myfile=FILENAME
a[\$1]=a[\$1] + \$2  # need to escape the dollar sign
} END {
for(i in a)
   print i, "",  a[i]/cnt
   } ' $(ls /tmp/a*.txt|xargs)
EOF
# make the new program executable
chmod 755 new.ss
# now, run it
new.ss


Report •

#6
December 8, 2015 at 19:47:51
If a solution works, you should mark it the best answer. It helps other people to get a quick solution after getting referred to this page by Google you know.

Report •

Ask Question