Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi, i have some trouble with a tab delimited txt file that i want to restructure somewhat. I have talked to some people on #perlhelp (efnet), and one of em gave me a oneliner that does what i want, but i need to rewrite the script as im going to convert it to an .exe. Also, the oneliner doesnt make much sense to me as im pretty new to perl, and i really would like to understand what i do.
So, here's what i want to do with the txt file
1. Skip all lines containing one or more of the following words "CL08001 squib Salgsorganisation Salgskanal Division Bruger reklamatGrund Type". (Header that occurs every 20'th line)
I tried with:next if /^ZCL08001 squib Salgsorganisation Salgskanal Division Bruger reklamat Grund Type/;
but that didnt work for some reason.2. In the file, there are multiple tabs between some of the strings. I want all those multible tabs replaced by just one single tab.
I've tried with "$_ =~ s/\t+/\t/;", but it only replaced some of the tabs for some reason (within a WHILE FILE loop).3. I want to remove newlines (\n) until perl reaches a blank line. This because the data is usually spread over 3 lines, then 2 blank lines. I guess i can join/remove newlines somehow?! Im a littlebit unsure how to do this.
Also, all blank lines should be skipped.
I guess the "oneliner" do that here "print join qq{\t}, split /[\n;\t]+/""..
But how to i insert it into a while loop (and how does it work?)"while( <FILE> ) {"
...
...
}So the result file should look like: multiple tabs replaced with just 1 tab
newlines removed until a blank line is reached, then skip the blank lines until some line with text occurs again.I hope this makes any sense&that anyone can help and perhaps explain the aproach.
In the end, you can see the "oneliner" i got that i want to rewrite & trying to understand.
(oneliner)
perl -00 -nwle"next if /CL08001 squib Salgsorganisation Salgskanal Division Bruger reklamatGrund Type/; print join qq{\t}, split /[\n;\t]+/" foo.txt

Let's start by looking at the switches.
-00 read-in the file in paragraph mode (i.e., blocks separated by blank lines).
-n use a while loop to go through each line (paragraph/block) of the file.
-w enable warnings
-l chomp the line terminator and add it back in the print statement.
-e execute the following perl code.
=================================================================
Now let's look at the code.
next if /CL08001 squib Salgsorganisation Salgskanal Division Bruger reklamatGrund Type/;
Skip any paragraph that includes the header string (must be an exact match i.e., each of those words separated by a single space [not a tab])
print join qq{\t}, split /[\n;\t]+/
Let's read the print statement from left to right.
Split the string on any of the following characters: \t tab, ; semi-colon, or \n newline
The results of the split are joined with \t tab characters and and passed to the print command.
The sting is printed and the \n line terminator is added (via the -l switch).
=================================================================
I see possible problems with that 1 liner.
It assumes that you have a blank line both before and after each of the header strings. If that's not the case, it'll also be skipping over some of the tab separated data that you want to keep.
The header string is hard codded in the regex to look for those words separated by a single space not a tab.
Without seeing a sample of your real data, I can't be sure, but his may do what you need.
#!/usr/bin/perl
use strict;
use warnings;$/ = "";
open (F, 'foo.txt') || die "open failed $!";
while(<F>) {
next if /CL08001 squib Salgsorganisation Salgskanal Division Bruger reklamatGrund Type/;
# print join qq{\t}, split /[\n;\t]+/;
s/[\t\s]+/\t/g;
print "$_\n";
}
close F;

Hi & big thanks for the help Fishmonger:)
I have some further questions i hope you could find time answer.$/ = ""; <-- Why this one??
open (F, 'foo.txt') || die "open failed $!";
while(<F>) {
next if /CL08001 squib Salgsorganisation Salgskanal Division Bruger reklamatGrund Type/;
# print join qq{\t}, split /[\n;\t]+/;
s/[\t\s]+/\t/g; <-- Could you give a little explanation on this one as i cannot quite relate it to the oneliner you just explained.
print "$_\n";
}
close F;The discarding of the header still doesnt work. The header looks like this (since this post wont show multiple tabs or newlines i fill em out)
\t\tSalgsorganisation\t\t\t\t\t\t:\t29\t\tDANMARK\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n
\t\tSalgskanal\t\t\t\t\t\t:\t10\t\tEkstern\t\t\t\t\t\t\t\tFrom Date\t\t\t 01.03.2007\tTo Date\t\t31.03.2007 \t\t\t\t\t\t\t\t\nSo any lines with one or more of the strings should just be discarded. Remember, not all of the strings are found at one line.
Ive used system 'findstr /v "CL08......." >tmpfile' previously, and it works fine&very efficiently, but i dont see why i shouldnt do it all in perl.btw, there are always a blank line before&after the header.

>> $/ = ""; <-- Why this one??
That has the same effect as the -00 switch.
>> s/[\t\s]+/\t/g; <-- Could you give a little explanation on this
That substitutes all multiple tabs and spaces with single tabs. It's very close, but not exactly what is being done in the split/join combination.
===================================================================
It sounds like the regex needs to be modified to use alternation instead of matching the complete exact string.
next if /(CL08001|squib|Salgsorganisation|Salgskanal|Division|Bruger|reklamatGrund|Type)/;
I'm tied up the rest of the day, but if you want to email me a sample of your data file and how it should look after processing, I'll look at it tomorrow.

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |