Removing duplicate rows using Perl

Phanikumar January 29, 2009 at 16:30:36
I am relatively new to perl and I have an array that has two columns, first an index and second the value. I want to find the duplicate entries based on the index and remove them from the array.

Eg. if the input is @inputarray
@inputarray= [1 45
2 34
1 33
3 60
4 44
1 22];

I want the output to be:
@outputarray= [2 34
3 60
4 44]


January 29, 2009 at 18:53:00
You probably mean you have a 2 dimensional array, which you actually don't have. You have a syntax error; "Number found where operator expected".

When you think duplicate, think hash, not an array.

C:\>perldoc -q duplicate
Found in C:\Perl\lib\pod\perlfaq4.pod
  How can I remove duplicate elements from a list or array?
    (contributed by brian d foy)

    Use a hash. When you think the words "unique" or "duplicated", think
    "hash keys".

    If you don't care about the order of the elements, you could just create
    the hash then extract the keys. It's not important how you create that
    hash: just that you use "keys" to get the unique elements.

       my %hash   = map { $_, 1 } @array;
       # or a hash slice: @hash{ @array } = ();
       # or a foreach: $hash{$_} = 1 foreach ( @array );

       my @unique = keys %hash;

    You can also go through each element and skip the ones you've seen
    before. Use a hash to keep track. The first time the loop sees an
    element, that element has no key in %Seen. The "next" statement creates
    the key and immediately uses its value, which is "undef", so the loop
    continues to the "push" and increments the value for that key. The next
    time the loop sees that same element, its key exists in the hash *and*
    the value for that key is true (since it's not 0 or undef), so the next
    skips that iteration and the loop goes to the next element.

            my @unique = ();
            my %seen   = ();

            foreach my $elem ( @array )
                    next if $seen{ $elem }++;
                    push @unique, $elem;

    You can write this more briefly using a grep, which does the same thing.

       my %seen = ();
       my @unique = grep { ! $seen{ $_ }++ } @array;

