Computing.Net > Forums > Unix > Sed/Awk Script

Sed/Awk Script

Reply to Message Icon

Original Message
Name: ajw85
Date: March 8, 2005 at 03:38:08 Pacific
Subject: Sed/Awk Script
OS: SunOS5
CPU/Ram: Sparc
Comment:

Hi,
I need a script to remove carriage returns that occur at the 84th character space

The files are DOS formatted at the moment.

Can anyone help?

Thanks,

Andy

EXAMPLE File:

Spaces replaced with underscores for clarity.


012345678901234567890123456789012345678901234567890123456789012345678901234

DATA_FILE
--------------
_________503________2004__________94________1811________1091___35.00000____
___94.7580439814815_______________37__-6999.900______-6999.900____
__-6999.900______-6999.900______-6999.900____
_________503________2004__________96_________126__________86___35.88000____
___96.0601375000124________________3__-6999.900______-6999.900____
__-6999.900______-6999.900______-6999.900____
_________503________2004_________104________1805________1085___47.25000____
___104.754019097222_______________37__-6999.900______-6999.900____
__-6999.900______-6999.900______-6999.900____
_________503________2004_________111________1914________1154__-16.00000____

---------------



Report Offensive Message For Removal

Response Number 1
Name: David Perry
Date: March 8, 2005 at 07:16:08 Pacific
Subject: Sed/Awk Script
Reply: (edit)

Does the location of CR matter? If not,

tr '\013' '' infile > outfile

although I would expect you actually want to replace CR with something like a line feed.
Does your host have dos2unix ?

dos2unix dosfile


Report Offensive Follow Up For Removal

Response Number 2
Name: Jim Boothe
Date: March 8, 2005 at 07:52:35 Pacific
Subject: Sed/Awk Script
Reply: (edit)

A straight transfer from DOS will have CRLF terminating each line. So the LFs are already there, and you just need to remove the CRs at the end each line.

A CR is decimal 13, but the syntax above specifies octal, so you will need '\015'


Report Offensive Follow Up For Removal

Response Number 3
Name: ajw85
Date: March 9, 2005 at 01:42:45 Pacific
Subject: Sed/Awk Script
Reply: (edit)

Further info.

The text files are now UNIX formatted. They are very long 20M lines. They were produced by a fixed format fortran program (PC Compaq Visual), thus the text wrap at line 84. I would like to remove the fixed format induced carriage return but not the end of record carriage return, so that every line will begin with the '503' field (see given example).

Thanks,

Andy


Report Offensive Follow Up For Removal

Response Number 4
Name: Jim Boothe
Date: March 9, 2005 at 07:45:13 Pacific
Subject: Sed/Awk Script
Reply: (edit)

OK, here's the way I understand your problem:

The fortran program outputs very long 20M logical lines, but something causes it to output these broken down into a series of 84-character lines.

And you would like to join the short lines together to create long 20M lines?

My awk cannot read lines longer than about 3000 characters, but if I am correct about your situation, most of your lines are 84 characters in length. Do these two commands on your file to see # lines and also # lines that are NOT 84 characters in length:

wc -l fortran.out
awk 'length!=84' fortran.out|wc -l

It would take almost 250,000 lines of 84 characters each to create one long 20M line.

Although awk cannot read an extremely long line, it can construct a very long output line. This would be an easy task for awk, with the only problem in identifying precisely our stopping point for each logical line. We cannot control on a field value such as 503 since that might show up in field 1 of any of the 250,000 lines.

We might can control on physical line length. If the fortran program always outputs, for each logical record, 249659 lines of length 84 followed by one line of lenth 62, then we could control on that. Or we could control on number of lines.

And by the way, unix lines are terminated with "newline" characters (also called linefeed or LF characters), which is Ctrl-J or octal 12 or decimal 10. Carriage returns in unix do not cause line breaks. They are just another character (Ctrl-M or octal 15 or decimal 13).


Report Offensive Follow Up For Removal

Response Number 5
Name: ajw85
Date: March 9, 2005 at 09:54:10 Pacific
Subject: Sed/Awk Script
Reply: (edit)

Hi Jim,

Thanks for your help. I think we are getting close to understanding the problem. The Fortran program indeed creates lines only 84 characters long (The LF always occurs at character 84, unless moving to a new record). However, I don't want to join them into really long lines, but so that there is one record per line.

Each record contains 13 fields, which at the moment is split over 3 lines. Field_1 always contains the number 503.

Current state

F_1 F_2 F_3 F_4 F_5 F_6 LF <RECORD_1>
F_7 F_8 F_9 F_10 LF <RECORD_1>
F_11 F_12 F_13 LF <RECORD_1>
F_1 F_2 F_3 F_4 F_5 F_6 LF <RECORD_2>
F_7 F_8 F_9 F_10 LF <RECORD_2>
F_11 F_12 F_13 LF <RECORD_2>
...ETC

Desired State

F_1 F_2 F_3 F_4 F_5 F_6 F_7 F_8 F_9 F_10 F_11 F_12 F_13 LF <RECORD_1>
F_1 F_2 F_3 F_4 F_5 F_6 F_7 F_8 F_9 F_10 F_11 F_12 F_13 LF <RECORD_2>
...ETC

The text file actually contains 11.5M records each is spread over 3 lines, thus taking up at total of 34.5M lines of the text file. I want the end file to contain 11.5M lines of 11.5M records.

So, an awk script that searches for a line feed that occurs as the 84th character and removes it would do the job perfectly, as long as it doesn't remove any line feed that occurs at any character location other than 84.

Thanks for you help,

Andy


Report Offensive Follow Up For Removal


Response Number 6
Name: Jim Boothe
Date: March 9, 2005 at 13:01:35 Pacific
Subject: Sed/Awk Script
Reply: (edit)

OK, so you want each set of three lines joined into one. For the volume of lines that you have, go with the sed solution, which will be much more efficient.

sed 'N;N;s/\n//g' filein > fileout

awk '{
out=$0
getline
out=out $0
getline
out=out $0
print out
}' filein > fileout


Report Offensive Follow Up For Removal

Response Number 7
Name: ajw85
Date: March 10, 2005 at 06:36:15 Pacific
Subject: Sed/Awk Script
Reply: (edit)

That did the job brillantly. Thanks very much.

Andy


Report Offensive Follow Up For Removal






Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Sed/Awk Script

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software