Computing.Net > Forums > Programming > Perl, Delete duplicates in array

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Perl, Delete duplicates in array

Reply to Message Icon

Name: Shr0Om
Date: August 24, 2007 at 03:45:09 Pacific
OS: XP
CPU/Ram: amd 64 3200
Product: custom
Comment:

I need some help figuring out how to delete only certain duplicate values in an array.
I know how to delete all duplicates in the entire array, but the array might also contain duplicates that are supposed to be there. So i need to filter out those lines i want to compare first.
Array looks somewhat like this:

Grund til reklamat 1
data
data
Grund til reklamat 1
data
data
data
Grund til reklamat 2
data
Grund til reklamat 2
data
data
Grund til reklamat 3
.
.
The result should instead look like this:
Grund til reklamat 1
data
data
data
data
data
Grund til reklamat 2
data
data
data
Grund til reklamat 3
.
.
I've got most of the script complete,
but im not sure how to remove only those duplicates containing string "Grund til reklamat".
Here's my script so far..

#Script start
unlink result.csv;
use strict;

my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l);
$/ = "";
my @dir = <*.csv>;
print "Behandler: $dir[0]\n";
my @data;

open (F, "$dir[0]") || die "open failed $!";
open (FO, '>Result.csv') || die "open failed $!";
print FO "Return;Customer;New;Product Nr;Product description;New date;Original;Old date;Creator\n";


while(<F>) {
next if /(CL08001|squib|Salgsorganisation|Salgskanal|Division|Bruger|Type)/; Discarding some headings
s/[\t\n]+/\t/g;
s/\t+$//;
s/['"']+//g;
($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l) = split ("\t", $_);
my ($first,$rest) = split (/\s/, $i,2 );
push(@data,"$a;$b;$c;$first;$rest;$d;$e;$f;$g\n");
}

foreach (@data)
{
$i++;
if($_ =~ m/Grund til reklamat/)
{
#Delete all duplicates containing string "Grund til reklamat" here.
}
}

print FO @data;
close F;
close FO;



Sponsored Link
Ads by Google

Response Number 1
Name: Mechanix2Go
Date: August 26, 2007 at 12:55:25 Pacific
Reply:

I dunno perl, but this bat should do it.

========================================
@echo off > new.a
setLocal EnableDelayedExpansion

for /f "tokens=* delims= " %%a in (my.a) do (
echo %%a | find "Grund til reklamat" > nul & if errorlevel 1 (
echo echo %%a >> new.a
) else (
find "%%a" < new.a > nul & if errorlevel 1 echo %%a >> new.a
)
)



=====================================
If at first you don't succeed, you're about average.

M2



0

Response Number 2
Name: Mechanix2Go
Date: August 26, 2007 at 15:51:08 Pacific
Reply:

***************** CORRECTION **********

@echo off > new.a
setLocal EnableDelayedExpansion

for /f "tokens=* delims= " %%a in (my.a) do (
echo %%a | find "Grund til reklamat" > nul & if errorlevel 1 (
echo %%a >> new.a
) else (
find "%%a" < new.a > nul & if errorlevel 1 echo %%a >> new.a
)
)



=====================================
If at first you don't succeed, you're about average.

M2



0

Response Number 3
Name: FishMonger
Date: August 27, 2007 at 11:25:00 Pacific
Reply:

while (<DATA>) {
if (/^Grund til reklamat/) {
$seen{$_}++;
print if $seen{$_} < 2;
}
else {
print;
}
}


__DATA__
Grund til reklamat 1
data
data
Grund til reklamat 1
data
data
data
Grund til reklamat 2
data
Grund til reklamat 2
data
data
Grund til reklamat 3


0

Response Number 4
Name: Guy
Date: August 27, 2007 at 18:28:06 Pacific
Reply:

"Array looks somewhat like ....."

Hmmmmmm.

Is it also possible the array looks like:

Grund til reklamat 1
data
data
Grund til reklamat 2
data
Grund til reklamat 1
data
data
data
Grund til reklamat 2
data
data
Grund til reklamat 3

????

If so, my initial reaction is that the above solutions will not work.

Guy



0

Response Number 5
Name: Mechanix2Go
Date: August 27, 2007 at 20:32:10 Pacific
Reply:

Guy,

yep



=====================================
If at first you don't succeed, you're about average.

M2



0

Related Posts

See More



Response Number 6
Name: Shr0Om
Date: August 29, 2007 at 06:04:26 Pacific
Reply:

Hi, thanks to all for the response.
I've got a solution now.
And the array headings are always in sorted order, so that isn't a problem.
The solution was to add this to my script:
if($_ =~ m/Grund til reklamat/){
my %seen; @data = grep { !$seen{$_}++ } @data #Added this..

I kinda know what it does.. Have to add i haven't read up on 'grep' yet. But well.. For now i just wanted to finish this script:P

#Entire script.. Ugly, but works..
#!/usr/bin/perl
unlink Result.csv; #Sletter evt. tidligere Result fil
use strict;
my @data;
my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l);
$/ = ""; #Leser blokkvis
my @dir = <*.csv>;
if (@dir) { # @dir er ikke tom...
} else { #@dir er tom
print "Ingen csv filer funnet!\n";
sleep 2;
exit;
}
print "Behandler: $dir[0]\nVennligst vent...\n";

open (F, "$dir[0]") || die "open failed $!";
open (FO, '>Result.csv') || die "open failed $!";
print FO "Return;Customer;New;Product Nr;Product description;New date;Original;Old date;Creator\n";

while(<F>) {
next if /(CL08001|squib|Salgsorganisation|Salgskanal|Division|Bruger|Type)/; #Discarder linjer med en av disse strings
s/[\t\n]+/\t/g; #Rydder opp i rotete csv fil..
s/\t+$//;
s/['"']+//g;
($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l) = split ("\t", $_);
my ($first,$rest) = split (/\s/, $i,2 );
push(@data,"$a;$b;$c;$first;$rest;$d;$e;$f;$g\n");
}
close F;
foreach (@data)
{
if($_ =~ m/Grund til reklamat/)
{
my %seen; @data = grep { !$seen{$_}++ } @data #Luker ut alle redundante array occurrences som inneholder string "Grund til reklamat"

}
}
print FO @data;
print "Ferdig! Se Result.csv\n";
sleep 2;


0

Response Number 7
Name: Shr0Om
Date: August 29, 2007 at 06:19:15 Pacific
Reply:

Mechanix2Go:

I didnt think there was a solution for this in batch.. But i see you figured out something here. Still, im a little baffled on how this works, and it made me a little curious.

& if errorlevel 1 (
echo %%a >> new.a ::Ok,if heading found,echo to new file
) else (#If heading not found..What happens here?
find "%%a" < new.a > nul & if errorlevel 1 echo %%a >> new.a

Still, since the script are to process a pretty large txt file (7000lines) it would be to inefficient with a batch..



0

Response Number 8
Name: Mechanix2Go
Date: August 29, 2007 at 08:25:52 Pacific
Reply:

The first echo >> puts in every data line.

the ELSE part puts in the 'label' only once.

Yeah, could tyake a while with a big file. Hopefully it doesn't need to run often.



=====================================
If at first you don't succeed, you're about average.

M2



0

Sponsored Link
Ads by Google
Reply to Message Icon






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Programming Forum Home


Sponsored links

Ads by Google


Results for: Perl, Delete duplicates in array

Delete duplicates - command line www.computing.net/answers/programming/delete-duplicates-command-line-/10770.html

Perl, remove duplicate lines array www.computing.net/answers/programming/perl-remove-duplicate-lines-array/15675.html

delete record in files with index www.computing.net/answers/programming/delete-record-in-files-with-index/14502.html