Computing.Net > Forums > Unix > awk: prnt field val in 2nd line

awk: prnt field val in 2nd line

Reply to Message Icon

Original Message
Name: willdevv
Date: January 9, 2008 at 06:16:27 Pacific
Subject: awk: prnt field val in 2nd line
OS: Unix
CPU/Ram: core 2 duo 4 gb
Model/Manufacturer: Apple 17
Comment:

I need to find the filed locations that match a regexp, in
1st line, then use that field value to print the data in the
same locations from the rest of the lines.

For example,
LINE 1: ada.1 sds frd ada.2 rdfeb ada.rm ddfd
LINE 2: 1232 43232 123 1212 321 2321

OUTPUT 1232 123 321

Here is what I have so far (it is only draft work).
To make life easy, I have split the header line (first line)
and data lines into to files, while I get the logic/process
worked out.

My script so far is:
#!/bin/bash

for element in `awk '{ for (i=1;i<=NF;i++) {if ($i ~
/MAGresp/) print i}}' $1`

do

awk -v index=$element '{print $index }' $2

done
----

But I don't like this. I can't figure out how to get awk to
do step 1 in my script, then read in the next lines and
print the values that are in the locations.

Can anyone provide me guidance on this problem?

John



Report Offensive Message For Removal


Response Number 1
Name: James Boothe
Date: January 9, 2008 at 08:12:44 Pacific
Subject: awk: prnt field val in 2nd line
Reply: (edit)

So you want to identify each field in LINE 1 that matches a certain regexp, then print the corresponding fields in lines 2 thru end? So if we identify fields 2 and 5 on LINE 1, then we would want to print fields 2 and 5 from remainder of the file?

Your sample OUTPUT shows that you have printed fields 1 3 and 5. Looking at LINE 1, I cannot see how fields 1 3 and 5 would qualify for the same regexp. And I'm guessing that /MAGresp/ is just symbolic here?

It would help to understand the requirements if your posted sample data matched your code.


Report Offensive Follow Up For Removal

Response Number 2
Name: willdevv
Date: January 9, 2008 at 22:09:18 Pacific
Subject: awk: prnt field val in 2nd line
Reply: (edit)

Sorry for the confusion in my post.

Ln 1: RESP.1x2 OT.1 DNn.32. RESP-gm.13 HT.t
Ln 2: 12.2343 0.011 1.0101 31.03242 23.22
Ln 3: 10.3200 1.002 2.234 44.43432 12.12

Output needed: 12.2343 31.03242 10.3200 44.43432

-+-+-+-+-+-+-+-+-+-+-+-+

SCRIPT (So far):

#!/bin/bash

for element in `awk '{
for (i=1;i<=NF;i++) {
if ($i ~ /RESP/) print i
}
}' $1`

do

awk -v index=$element '{print $index }' $2

done

-+-+-+-+-+-+-+-+-+-+-+-+

This works as long as I have 2 files, first is the header line,
and second file is the lines of data.

I would like to be able to generate this output using awk
and reading only 1 file because I need to process files that
will have 2k to 6k fields.

John



Report Offensive Follow Up For Removal

Response Number 3
Name: willdevv
Date: January 10, 2008 at 05:59:12 Pacific
Subject: awk: prnt field val in 2nd line
Reply: (edit)

I figured it out. Here is the code snippet:

NR == 1
{
for (i=1;i<=NF;i++)
{
x++;if ($i ~ /RESP/) indx[x] = i
}
}

NR != 1
{
for ( item in indx )
print $indx[item]
}



Report Offensive Follow Up For Removal

Response Number 4
Name: James Boothe
Date: January 10, 2008 at 09:37:21 Pacific
Subject: awk: prnt field val in 2nd line
Reply: (edit)

Nice.

FYI, when I run that code on linux as is, it does not produce the desired results.  The problem was:

NR==1
{do some stuff}

For my platform, the action statement needs to be on (or at least start on) the same line as the pattern/condition statement.  Or use line continuation:

NR==1 \
{do some stuff}

On separate lines like it was, they were not associated.  When the NR==1 or NR!=1 was true, not having an associated action statement, they would execute the default action of {print $0}.  And the action statements that followed were executing for each line unconditionally.

Following is my solution.  Just to be different, instead of using NR conditions, I put the NR==1 logic in a BEGIN statement, so I had to explicitly read the first line with getline.

And instead of using for (item in indx), which will pull things out of the array in whatever order, I walk thru the stored array from 1 thru n.  This will maintain the left to right sequence of the fields, which I realize may not be a requirement in this case.

Finally, I chose to output all the fields from one line on a single line by not printing the newline character until finished with each line.  Or the newline could be delayed until all lines have been processed so that all of the fields would be printed on a single line, which is what you showed as Output needed.

awk 'BEGIN {
getline
for (i=1;i<=NF;i++)
   if ($i~/RESP/)
     {n++
      indx[n]=i}
}

{for (i=1;i<=n;i++ )
   printf "%s ",$indx[i]
 printf "\n"
}
' file.in


12.2343 31.03242
10.3200 44.43432


Report Offensive Follow Up For Removal

Response Number 5
Name: willdevv
Date: January 11, 2008 at 04:46:56 Pacific
Subject: awk: prnt field val in 2nd line
Reply: (edit)

Sorry James for the confusion again; I have this code in an
awk script. I find it easier to work with complex awk in a
script rather than the command line.

Now for the next bit that I am working on.

Now I need to keep a running average for each array and
then if a value is greater than 8000, replace with the
running average value.

I have been working on this today and am slugging
through it.

PS I liked the addition of the formatting of the output!

John


Report Offensive Follow Up For Removal


Response Number 6
Name: James Boothe
Date: January 11, 2008 at 08:55:17 Pacific
Subject: awk: prnt field val in 2nd line
Reply: (edit)

Here is my solution for that.

I changed your indx array to "col".

When processing each line, I pull each column value into a work variable called colval since I reference it several times.

As coded, each value is added to the running total for its column, even those that exceed 8000 and have the running average printed in their place. This means that you cannot just look at the printed output to confirm that the running average is being calculated correctly because the columns that exceed 8000 are being added into the running total but do not show on the output.

Maybe you want those values that exceed 8000 to NOT add to the running total?

Also, I am assuming that the columns are completely on their own.  For example, for a given line, one column exceeds 8000, thus processed under the alternate rules, while all other columns for that same line might be processed under normal rules.

For this posting, I printed fixed 4-decimal numbers.  And just for testing, the format mask includes an asterisk when a running average is used.

file.in:
RESP.1x2 OT.1 DNn.32. RESP-gm.13 HT.t
100 0 0 200 0
100 0 0 200 0
100 0 0 200 0
8005 0 0 200 0
100 0 0 200 0
100 0 0 200 0
100 0 0 200 0
100 0 0 9000 0
100 0 0 200 0
100 0 0 200 0


awk 'BEGIN {
getline
for (i=1;i<=NF;i++)
   if ($i~/RESP/)
     {n++
      col[n]=i}
}

{for (i=1;i<=n;i++ )
   {colval=$col[i]
    rtot[i]=rtot[i]+colval
    ravg=rtot[i]/(NR-1)
    if (colval>8000)
       printf "%11.4f* ",ravg
    else
       printf "%11.4f  ",colval
   }
 printf "\n"
}
' file.in


./john.sh
  100.0000     200.0000
   100.0000     200.0000
   100.0000     200.0000
  2076.2500*    200.0000
   100.0000     200.0000
   100.0000     200.0000
   100.0000     200.0000
   100.0000    1300.0000*
   100.0000     200.0000
   100.0000     200.0000


Report Offensive Follow Up For Removal






Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: awk: prnt field val in 2nd line

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software