Computing.Net > Forums > Unix > Awk to parse and output

Awk to parse and output

Reply to Message Icon

Original Message
Name: DeeDogg
Date: September 12, 2005 at 09:55:42 Pacific
Subject: Awk to parse and output
OS: Sun OS
CPU/Ram: sparc9 2Gigs Rams
Comment:

Hi,
I have a list of items separated by the semicolon. Is it possible to output each item to a separate file using awk(or nawk).

Thanks in advance.
-Jean

Example:
item#1
aaa
bbb
ccc
;
item#2
aaa
bbb
ccc
;
itme#3
aaa
bbb
ccc
;
....



Report Offensive Message For Removal


Response Number 1
Name: Luke Chi
Date: September 12, 2005 at 10:55:07 Pacific
Reply: (edit)

program:

awk -F";" ' { print $1 > "item1.txt"; print $2 > "item2.txt"; print $3 > "item3.txt"; } ' input.txt

input.txt:

1;2;3
a;b;c
first;second;third

output:

item1.txt:
1
a
first

item2.txt:
2
b
second

item3.txt:
3
c
third


Luke Chi


Report Offensive Follow Up For Removal

Response Number 2
Name: DeeDogg
Date: September 12, 2005 at 11:36:14 Pacific
Reply: (edit)

Hi Luke,
Thanks for the reply.
I was looking for:
item1.txt:
aaa
bbb
ccc
;

then item2.txt:
aaa
bbb
ccc
;
etc...


I don't want to separate the words on the lines, but rather the paragraphs, which are separated by the semicolon.

Thanks,
Jean


Report Offensive Follow Up For Removal

Response Number 3
Name: Jim Boothe
Date: September 12, 2005 at 12:15:15 Pacific
Reply: (edit)

The csplit command can come in very handy at times.  The following command will split infile into separate files named outfile00, outfile01, etc, with lines consisting of semicolon being the separation line.  The {*} following the pattern says to use that pattern repeatedly.

The problem with this solution is that the matched separator line becomes the first line of each new output file.  I think you want to throw that line away, and I do not find a csplit option to do so.

csplit -f outfile infile '/^;$/' '{*}'

So here is an awk solution, and I coded it two different ways, but the output will be identical.

awk '\
BEGIN {fileout="outfile.001"
       seq=1}
{if ($0==";")
   {if (seq!=0)
       close fileout
    seq++
    fileout=sprintf "outfile.%3.3d",seq
    next}
 print > fileout
}' infile


awk '\
BEGIN {seq=01
       fileout="fileout.001"}
/;/ {close fileout
     seq++
     fileout=sprintf "fileout.%3.3d",seq
     open fileout
     next}
{print > fileout}' infile


Report Offensive Follow Up For Removal

Response Number 4
Name: Luke Chi
Date: September 12, 2005 at 12:41:12 Pacific
Reply: (edit)

On Redhat linux:

$ csplit -f item input.txt /\;/+1 {*}

input.txt:

1
2
3
;
a
b
c
;
first
second
third
;

output:

item00:

1
2
3
;

item01:

a
b
c
;

item02:

first
second
third
;

Note: My Solaris has problem to deal with {*}.

Luke Chi


Report Offensive Follow Up For Removal

Response Number 5
Name: Luke Chi
Date: September 12, 2005 at 13:12:21 Pacific
Reply: (edit)

The following is for Solaris:

CT=`grep ^\;$ input.txt | wc -l`
csplit -f item input.txt /\;/+1 {`expr $CT - 2`}

My HP is down and I can't test it on HP machine at this moment.

Luke Chi


Report Offensive Follow Up For Removal


Response Number 6
Name: Jim Boothe
Date: September 13, 2005 at 07:00:21 Pacific
Reply: (edit)

But still, the csplit +1 operand does nothing to get rid of the delimiter line.

The +1 says instead of using the line containing the pattern as the delimiting line, to use the line following. So instead of each output file beginning with the delimiter line, each output file will end with a delimiter line.


Report Offensive Follow Up For Removal

Response Number 7
Name: DeeDogg
Date: September 13, 2005 at 07:06:02 Pacific
Reply: (edit)

Hi,
Thanks for the help I went with the csplit command:
csplit -k -n{3} -fpin input.file '/^-/' '/;/+1' '{99}'

I told it where the file started '-' and where the file ended ';'. The limitation is the problem, which is 99, and I have about 578 list of items. Before I would get an out of range error, so I added the -n option.

Thanks for the help Luke, Jim,
Jean



Report Offensive Follow Up For Removal

Response Number 8
Name: Luke Chi
Date: September 13, 2005 at 10:35:47 Pacific
Reply: (edit)

Good !

Luke Chi


Report Offensive Follow Up For Removal






Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Awk to parse and output

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 5 Days.
Discuss in The Lounge