reformat file, Script Help -- Perl or Awk??

Sigma Zoomwideangle-tphoto24-70mm f2.8 a...
November 9, 2009 at 10:12:06
Specs: Linux
Hi-

I'm a programming newbie and could use a hint on how to get started with reformatting this file. I use AWK a lot and have done a bit of simple Perl programming too.

Here's a sample of the input file.

LNS 3
LIN 2
PTS 330715.6 4581794.5
PTS 334601.3 4584845.4
LNN 401
EOL
LIN 2
PTS 330672.2 4581887.5
PTS 336492.9 4586457.8
LNN 402
EOL
LIN 3
PTS 323504.7 4594949.6
PTS 324069.1 4595265.7
PTS 324030.4 4595362.4
LNN 576
EOL

"LNS 3" -- indicates that there are 3 records in this input file (each record starts with an LIN line and ends with an EOL line)
"LIN #" -- where # is the number of nodes in the record.
"PTS east north" -- this is a node with variables "east" and "north"
"LNN ###" -- where ### is the unique record number
"EOL" -- indicate the end of the given record.

I basically want the output file to look like this:

401 330715.6 4581794.5
401 334601.3 4584845.4
402 330672.2 4581887.5
402 336492.9 4586457.8
576 323504.7 4594949.6
576 324069.1 4595265.7
576 324030.4 4595362.4

so that each line of nodes contains the record number. The caveat I can't seem to figure out is that the number of nodes in each record varies (but is indicated by the LIN line, so if I could somehow count based on that variable that would work).

Any hints are welcome. I'm just looking for point in the right direction.

Thanks!!


See More: reformat file, Script Help -- Perl or Awk??

Report •


#1
November 9, 2009 at 10:40:13
Have you considered reversing the file? Then, you can eaily find the LNN number and find the lines starting with PTS. This can be done with the Bash shell.

If you need further help, I guess I can provide it.



Report •

#2
November 9, 2009 at 10:47:41
@nails:
Thanks for the suggestion. Then maybe I can ignore the LNS, LIN and EOL lines and just append the LNN line to the subsequent PTS lines until I hit another LNN. I'll give that a try. Thanks!

Report •

#3
November 9, 2009 at 11:08:18
You are welcome. Here is an article describing 8 ways to reverse a file in Unix/Linux:

http://www.theillien.com/Sys_Admin_...


Report •

Related Solutions

#4
November 9, 2009 at 11:18:15
I'm sure I've seen this same question in the recent past, which tells me that this is a homework assignment.

What have you tried?

Which language do you really need to use?

As a hint to a Perl solution, you can read-in the file in record mode (chunks) and use a regex to extract the data within each record.

Read up on the $/ ($INPUT_RECORD_SEPARATOR) var in `perldoc perlvar`


Report •

#5
November 9, 2009 at 11:34:53
I wish this was a homework assignment; that would mean that I had a teacher to consult with these sort of questions. I'm just scientist and very-novice programmer. I'm more comfortable writing AWK scripts but I've had some luck modifying existing Perl code which I find to be far more robust.

I'll check out the $INPUT_RECORD_SEPARATOR in Perl. Thanks for the hint!


Report •

#6
November 9, 2009 at 11:53:58
#!/usr/bin/perl

use strict;
use warnings;

$/ = 'LIN ';
foreach my $record ( <> ) {
    my ($LNN) = $record =~ /LNN (\d+)/ or next;
    print "$LNN $1" for $record =~ /PTS (.+\n)/g;
}

============

Executed as:
./geolack.pl < input > output


Report •

#7
November 9, 2009 at 13:50:02
Wow, thank you so much!! That does exactly what I wanted. And it probably would have taken me the rest of the day (and then-some) to figure that out.

I am I reading this right??

$/ = 'LIN ';
#sets the "chunks" of data starting at each LIN line

my ($LNN) = $record =~ /LNN (\d+)/ or next;
#defines the variable LNN when it gets the pattern LNN DDD (where DDD is any num of digits)

print "$LNN $1" for $record =~ /PTS (.+\n)/g;
#prints the lines when the line has the pattern PTS with the value that its holding for LNN pre-pended to the start of the line.

What is the /PTS (.+\n)/g; part? Is that how you got rid of the PTS and just keep the digits from columns 2 & 3?

Thanks so much for the help!!


Report •

#8
November 9, 2009 at 18:42:35
Yes, your understanding of the script is correct.

This script has no error handling, which IMO is one of the most important aspects of a script, but it's a starting point.


Report •

#9
November 10, 2009 at 17:09:00
here's a gawk solution
awk 'BEGIN{ RS="EOL"}
{
  u=$NF
  m=split($0,s,"\n")
  for(i =1;i<=m;i++){
    if(s[i]~/PTS/){
        sub(/PTS /,"",s[i])
        print u,s[i]
    }
  }
}' file

output
$ ./shell.sh
401 330715.6 4581794.5
401 334601.3 4584845.4
402 330672.2 4581887.5
402 336492.9 4586457.8
576 323504.7 4594949.6
576 324069.1 4595265.7
576 324030.4 4595362.4

GNU win32 packages | Gawk


Report •

#10
November 19, 2009 at 06:58:29
Thanks ghostdog. I'll try your gawk method too.

Thanks everyone for the helpful responses.


Report •


Ask Question