Computing.Net > Forums > Unix > awk question

awk question

Reply to Message Icon

Original Message
Name: narsman
Date: January 25, 2005 at 18:09:34 Pacific
Subject: awk question
OS: Windows NT
CPU/Ram: Intel Penthium 2.0 Ghz 51
Comment:

Hi All, I just started taking Unix and need help with my assignment.

/home% cat file1
1file08
2file08
1file03
1file05
2file05
1file09
2file09

I need to group all lines that start with 1 and the same thing with lines that start with 2. My output should be:

GROUP1
(1,1file08)
(2,1file03)
(3,1file05)
(4,1file09)

GROUP2
(1,2file08)
(3,2file05)
(4,2file09)

I can use grep to group the files but the problem is the "number" before it. Note on GROUP2 does not have level 2 because there is not 2file03.

Thanks for any input you may have.

~Narsman



Report Offensive Message For Removal

Response Number 1
Name: thepubba
Date: January 26, 2005 at 05:53:48 Pacific
Subject: awk question
Reply: (edit)

Here is a simple Korn shell solution. With some work, you can make it better. I don't have time.


#!/bin/ksh

group1=
group2=

exec 3< ./junk999.dat

while read -u3 line
do
case $line in
1* ) group1="$group1$line "
;;
2* ) group2="$group2$line "
;;
esac
done

/usr/bin/tput clear

counter=1

print "\n\nGROUP1"

for name in $group1
do
print "($counter,$name)"
(( counter += 1 ))
done

counter=1

print "\nGROUP2"

for name in $group2
do
print "($counter,$name)"
(( counter += 1 ))
done

print "\n"


Jerry


Report Offensive Follow Up For Removal

Response Number 2
Name: vgersh99
Date: January 26, 2005 at 07:20:14 Pacific
Subject: awk question
Reply: (edit)

nawk -f nars.awk file1

here's nars.awk:
-------------------
{
if (!match($0, /^[0-9][0-9]*/) )
next;
else
grpID=(substr($0, RSTART, RLENGTH));

arr[grpID] = (grpID in arr) ? arr[grpID] SUBSEP $0 : $0;
}

END {
for ( i in arr ) {
print "GROUP" i;
n=split(arr[i], cellA, SUBSEP);
for(j=1; j<=n; j++)
printf("\t(%d,%s)\n", j, cellA[j]);
}
}
---------------

vlad
#include<disclaimer.h>


Report Offensive Follow Up For Removal

Response Number 3
Name: narsman
Date: January 26, 2005 at 10:16:14 Pacific
Subject: awk question
Reply: (edit)

Thanks for the help, Jerry & vgersh99.


Report Offensive Follow Up For Removal

Response Number 4
Name: narsman
Date: January 26, 2005 at 13:54:27 Pacific
Subject: awk question
Reply: (edit)

Hi vgersh99,

I tried your script and got this output:

GROUP2,
(1,2file08)
(2,2file05) -> this should say (3,2file05)
(3,2file09) -> this should say (4,2file09)
GROUP2,
(1,1file08)
(2,1file04)
(3,1file05)
(4,1file09)


-> because there is no 2file04.

Thanks,


Report Offensive Follow Up For Removal

Response Number 5
Name: vgersh99
Date: January 26, 2005 at 14:52:39 Pacific
Subject: awk question
Reply: (edit)

hm.... this is confusing....

could you explain AGAIN how you derive the LEADING numbers bfore the ',' in:

GROUP2
(1,2file08)
(3,2file05)
(4,2file09)

I thought they where just sequestion numbers...

vlad
#include<disclaimer.h>


Report Offensive Follow Up For Removal


Response Number 6
Name: narsman
Date: January 26, 2005 at 16:40:44 Pacific
Subject: awk question
Reply: (edit)

Hi vgersh99,

Here is a copy of my script...

===============================
#! /bin/csh -f
set pcount1 = 1
set pcount2 = 1
set grp1 = `grep '^1' file2`
set grp2 = `grep '^2' file2`

echo "GROUP1"
foreach PLOOP1 ($grp1)
setenv G1 $grp1[$pcount1]
echo "($pcount1,$G1)"
@ pcount1 = $pcount1 + 1
end

echo ""
echo "GROUP2"
foreach PLOOP2 ($grp2)
setenv G2 $grp2[$pcount2]
echo "($pcount2,$G2)"
@ pcount2 = $pcount2 + 1
end
===============================

I really appreciate your help on this.

Thanks.


Report Offensive Follow Up For Removal

Response Number 7
Name: thepubba
Date: January 26, 2005 at 17:13:42 Pacific
Subject: awk question
Reply: (edit)

I'm sure Vlad will solve your awk problem. However, I'd recommend you forget csh for writing shell scripts. The csh is lacking in too many areas. Look at using bash, ksh or even sh. This is not just a personal opinion; it is shared by many system administrators. Ask around, you'll find the csh is not used much for writing shell scripts.

A good example is my shell script. I simply use the file descriptor to read the file. I don't need to use grep at all.

I see what you are after with the second group. It would be easy to modify my script to compare the values in each group and number them accordingly.



Report Offensive Follow Up For Removal

Response Number 8
Name: narsman
Date: January 26, 2005 at 17:58:09 Pacific
Subject: awk question
Reply: (edit)

Hi Jerry. Yeah, I was told to get away from csh and consider learning perl. Like what I've said on my first post, I'm a newbie so right now, I'm just trying to concentrate on getting my feet wet. Thanks for the input.


Report Offensive Follow Up For Removal

Response Number 9
Name: vgersh99
Date: January 27, 2005 at 05:54:11 Pacific
Subject: awk question
Reply: (edit)

sorry, I don't DO csh ;)

You posted your script, but I asked to explain the algorithm though.

vlad
#include<disclaimer.h>


Report Offensive Follow Up For Removal

Response Number 10
Name: narsman
Date: January 28, 2005 at 11:38:53 Pacific
Subject: awk question
Reply: (edit)

Here's what goes on in the script...

The results of grep are assigned to grp1 and grp2.

$grp1 is 1file08[1] 1file04[2] 1file05[3] 1file09[4]

$grp2 is 2file08[1] 2file05[2] 2file09[3]


I put them in a loop (similar to {for (i=1; i<=NF; i++)} in awk).

The problem here is on [2file05[2] in ($grp2)], I'm trying to make this 2file05[3].

I hope this makes sense as I'm not really good on explaination.

Thanks for trying though.

Actually I have another awk question which I will post in another topic.

TGIF!


Report Offensive Follow Up For Removal

Response Number 11
Name: Jim Boothe
Date: January 28, 2005 at 14:28:52 Pacific
Subject: awk question
Reply: (edit)

My understanding is that column position 1 defines the "group", and columns 2-n is a control break field.  We need to number the control breaks and generate a sequencing number based on that.

The first phase sequences the control breaks and formats the lines.  It puts a group number at the beginning of the line also to make sorting easy.  The output from the first phase (prior to sorting) would be:

1 (1,1file08)
2 (1,2file08)
1 (2,1file03)
1 (3,1file05)
2 (3,2file05)
1 (4,1file09)
2 (4,2file09)

After sorting, the final pass control breaks on the group number and inserts the GROUP header lines.  That temporary group number at the front of the line gets removed on this phase.


#!/bin/ksh
seq=0
holdkey=
holdgroup=

while read line
do
key=${line#?}
prefix=${line%$key}
group=$prefix
if test "$key" != "$holdkey" ; then
   holdkey=$key
   ((seq=seq+1))
fi
echo "$group ($seq,$line)"
done < junk.txt |
sort -k 1n -k2 |
while read group line
do
if test "$group" != "$holdgroup" ; then
   holdgroup=$group
   echo "\nGROUP$group"
fi
echo $line
done

GROUP1
(1,1file08)
(2,1file03)
(3,1file05)
(4,1file09)

GROUP2
(1,2file08)
(3,2file05)
(4,2file09)

And by the way, this solution assumes any number of groups. If I knew that there would always be just groups 1 and 2, I would have taken an entirely different (simpler) approach.


Report Offensive Follow Up For Removal

Response Number 12
Name: narsman
Date: February 1, 2005 at 14:53:07 Pacific
Subject: awk question
Reply: (edit)

Hi Jim,

That worked real well. Thanks for your help!

~narsman


Report Offensive Follow Up For Removal






Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: awk question

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software