|
|
|
Perl, Delete duplicates in array
|
Original Message
|
Name: Shr0Om
Date: August 24, 2007 at 03:45:09 Pacific
Subject: Perl, Delete duplicates in arrayOS: XPCPU/Ram: amd 64 3200Model/Manufacturer: custom |
Comment: I need some help figuring out how to delete only certain duplicate values in an array. I know how to delete all duplicates in the entire array, but the array might also contain duplicates that are supposed to be there. So i need to filter out those lines i want to compare first. Array looks somewhat like this: Grund til reklamat 1 data data Grund til reklamat 1 data data data Grund til reklamat 2 data Grund til reklamat 2 data data Grund til reklamat 3 . . The result should instead look like this: Grund til reklamat 1 data data data data data Grund til reklamat 2 data data data Grund til reklamat 3 . . I've got most of the script complete, but im not sure how to remove only those duplicates containing string "Grund til reklamat". Here's my script so far.. #Script start unlink result.csv; use strict; my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l); $/ = ""; my @dir = <*.csv>; print "Behandler: $dir[0]\n"; my @data; open (F, "$dir[0]") || die "open failed $!"; open (FO, '>Result.csv') || die "open failed $!"; print FO "Return;Customer;New;Product Nr;Product description;New date;Original;Old date;Creator\n"; while(<F>) { next if /(CL08001|squib|Salgsorganisation|Salgskanal|Division|Bruger|Type)/; Discarding some headings s/[\t\n]+/\t/g; s/\t+$//; s/['"']+//g; ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l) = split ("\t", $_); my ($first,$rest) = split (/\s/, $i,2 ); push(@data,"$a;$b;$c;$first;$rest;$d;$e;$f;$g\n"); }
foreach (@data) { $i++; if($_ =~ m/Grund til reklamat/) { #Delete all duplicates containing string "Grund til reklamat" here. } } print FO @data; close F; close FO;
Report Offensive Message For Removal
|
|
Response Number 1
|
Name: Mechanix2Go
Date: August 26, 2007 at 12:55:25 Pacific
|
Reply: (edit)I dunno perl, but this bat should do it. ======================================== @echo off > new.a setLocal EnableDelayedExpansion for /f "tokens=* delims= " %%a in (my.a) do ( echo %%a | find "Grund til reklamat" > nul & if errorlevel 1 ( echo echo %%a >> new.a ) else ( find "%%a" < new.a > nul & if errorlevel 1 echo %%a >> new.a ) )
===================================== If at first you don't succeed, you're about average.M2
Report Offensive Follow Up For Removal
|
|
Response Number 2
|
Name: Mechanix2Go
Date: August 26, 2007 at 15:51:08 Pacific
|
Reply: (edit)***************** CORRECTION ********** @echo off > new.a setLocal EnableDelayedExpansion for /f "tokens=* delims= " %%a in (my.a) do ( echo %%a | find "Grund til reklamat" > nul & if errorlevel 1 ( echo %%a >> new.a ) else ( find "%%a" < new.a > nul & if errorlevel 1 echo %%a >> new.a ) )
===================================== If at first you don't succeed, you're about average.M2
Report Offensive Follow Up For Removal
|
|
Response Number 3
|
Name: FishMonger
Date: August 27, 2007 at 11:25:00 Pacific
|
Reply: (edit)while (<DATA>) { if (/^Grund til reklamat/) { $seen{$_}++; print if $seen{$_} < 2; } else { print; } } __DATA__ Grund til reklamat 1 data data Grund til reklamat 1 data data data Grund til reklamat 2 data Grund til reklamat 2 data data Grund til reklamat 3
Report Offensive Follow Up For Removal
|
|
Response Number 4
|
Name: Guy
Date: August 27, 2007 at 18:28:06 Pacific
|
Reply: (edit)"Array looks somewhat like ....." Hmmmmmm. Is it also possible the array looks like: Grund til reklamat 1 data data Grund til reklamat 2 data Grund til reklamat 1 data data data Grund til reklamat 2 data data Grund til reklamat 3 ???? If so, my initial reaction is that the above solutions will not work. Guy
Report Offensive Follow Up For Removal
|
|
Response Number 6
|
Name: Shr0Om
Date: August 29, 2007 at 06:04:26 Pacific
|
Reply: (edit)Hi, thanks to all for the response. I've got a solution now. And the array headings are always in sorted order, so that isn't a problem. The solution was to add this to my script: if($_ =~ m/Grund til reklamat/){ my %seen; @data = grep { !$seen{$_}++ } @data #Added this.. I kinda know what it does.. Have to add i haven't read up on 'grep' yet. But well.. For now i just wanted to finish this script:P #Entire script.. Ugly, but works.. #!/usr/bin/perl unlink Result.csv; #Sletter evt. tidligere Result fil use strict; my @data; my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l); $/ = ""; #Leser blokkvis my @dir = <*.csv>; if (@dir) { # @dir er ikke tom... } else { #@dir er tom print "Ingen csv filer funnet!\n"; sleep 2; exit; } print "Behandler: $dir[0]\nVennligst vent...\n"; open (F, "$dir[0]") || die "open failed $!"; open (FO, '>Result.csv') || die "open failed $!"; print FO "Return;Customer;New;Product Nr;Product description;New date;Original;Old date;Creator\n"; while(<F>) { next if /(CL08001|squib|Salgsorganisation|Salgskanal|Division|Bruger|Type)/; #Discarder linjer med en av disse strings s/[\t\n]+/\t/g; #Rydder opp i rotete csv fil.. s/\t+$//; s/['"']+//g; ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j,$k,$l) = split ("\t", $_); my ($first,$rest) = split (/\s/, $i,2 ); push(@data,"$a;$b;$c;$first;$rest;$d;$e;$f;$g\n"); } close F; foreach (@data) { if($_ =~ m/Grund til reklamat/) { my %seen; @data = grep { !$seen{$_}++ } @data #Luker ut alle redundante array occurrences som inneholder string "Grund til reklamat" } } print FO @data; print "Ferdig! Se Result.csv\n"; sleep 2;
Report Offensive Follow Up For Removal
|
|
Response Number 7
|
Name: Shr0Om
Date: August 29, 2007 at 06:19:15 Pacific
|
Reply: (edit)Mechanix2Go: I didnt think there was a solution for this in batch.. But i see you figured out something here. Still, im a little baffled on how this works, and it made me a little curious. & if errorlevel 1 ( echo %%a >> new.a ::Ok,if heading found,echo to new file ) else (#If heading not found..What happens here? find "%%a" < new.a > nul & if errorlevel 1 echo %%a >> new.a Still, since the script are to process a pretty large txt file (7000lines) it would be to inefficient with a batch..
Report Offensive Follow Up For Removal
|
|
Response Number 8
|
Name: Mechanix2Go
Date: August 29, 2007 at 08:25:52 Pacific
|
Reply: (edit)The first echo >> puts in every data line. the ELSE part puts in the 'label' only once. Yeah, could tyake a while with a big file. Hopefully it doesn't need to run often.
===================================== If at first you don't succeed, you're about average.M2
Report Offensive Follow Up For Removal
|
Use following form to reply to current message:
|
|

|