Name: willdevv Date: January 9, 2008 at 06:16:27 Pacific Subject: awk: prnt field val in 2nd line OS: Unix CPU/Ram: core 2 duo 4 gb Model/Manufacturer: Apple 17
Comment:
I need to find the filed locations that match a regexp, in 1st line, then use that field value to print the data in the same locations from the rest of the lines.
For example, LINE 1: ada.1 sds frd ada.2 rdfeb ada.rm ddfd LINE 2: 1232 43232 123 1212 321 2321
OUTPUT 1232 123 321
Here is what I have so far (it is only draft work). To make life easy, I have split the header line (first line) and data lines into to files, while I get the logic/process worked out.
My script so far is: #!/bin/bash
for element in `awk '{ for (i=1;i<=NF;i++) {if ($i ~ /MAGresp/) print i}}' $1`
do
awk -v index=$element '{print $index }' $2
done ----
But I don't like this. I can't figure out how to get awk to do step 1 in my script, then read in the next lines and print the values that are in the locations.
So you want to identify each field in LINE 1 that matches a certain regexp, then print the corresponding fields in lines 2 thru end? So if we identify fields 2 and 5 on LINE 1, then we would want to print fields 2 and 5 from remainder of the file?
Your sample OUTPUT shows that you have printed fields 1 3 and 5. Looking at LINE 1, I cannot see how fields 1 3 and 5 would qualify for the same regexp. And I'm guessing that /MAGresp/ is just symbolic here?
It would help to understand the requirements if your posted sample data matched your code.
FYI, when I run that code on linux as is, it does not produce the desired results. The problem was:
NR==1 {do some stuff}
For my platform, the action statement needs to be on (or at least start on) the same line as the pattern/condition statement. Or use line continuation:
NR==1 \ {do some stuff}
On separate lines like it was, they were not associated. When the NR==1 or NR!=1 was true, not having an associated action statement, they would execute the default action of {print $0}. And the action statements that followed were executing for each line unconditionally.
Following is my solution. Just to be different, instead of using NR conditions, I put the NR==1 logic in a BEGIN statement, so I had to explicitly read the first line with getline.
And instead of using for (item in indx), which will pull things out of the array in whatever order, I walk thru the stored array from 1 thru n. This will maintain the left to right sequence of the fields, which I realize may not be a requirement in this case.
Finally, I chose to output all the fields from one line on a single line by not printing the newline character until finished with each line. Or the newline could be delayed until all lines have been processed so that all of the fields would be printed on a single line, which is what you showed as Output needed.
awk 'BEGIN { getline for (i=1;i<=NF;i++) if ($i~/RESP/) {n++ indx[n]=i} }
Sorry James for the confusion again; I have this code in an awk script. I find it easier to work with complex awk in a script rather than the command line.
Now for the next bit that I am working on.
Now I need to keep a running average for each array and then if a value is greater than 8000, replace with the running average value.
I have been working on this today and am slugging through it.
PS I liked the addition of the formatting of the output!
When processing each line, I pull each column value into a work variable called colval since I reference it several times.
As coded, each value is added to the running total for its column, even those that exceed 8000 and have the running average printed in their place. This means that you cannot just look at the printed output to confirm that the running average is being calculated correctly because the columns that exceed 8000 are being added into the running total but do not show on the output.
Maybe you want those values that exceed 8000 to NOT add to the running total?
Also, I am assuming that the columns are completely on their own. For example, for a given line, one column exceeds 8000, thus processed under the alternate rules, while all other columns for that same line might be processed under normal rules.
For this posting, I printed fixed 4-decimal numbers. And just for testing, the format mask includes an asterisk when a running average is used.