Computing.Net > Forums > Unix > Find duplicate value and create an

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Find duplicate value and create an

Reply to Message Icon

Name: ricky007
Date: February 22, 2008 at 08:16:24 Pacific
OS: solaris
CPU/Ram: solaris
Product: 10
Comment:

I need a perl script, which will run every midnight via cronjob and e-mail few users once it finds any duplicated value in a file which is located /etc/hosts, the file name is called hosts and the format of the file has 3 colums and some time 2 columns. The script will look for duplicate IP or Duplicate device name, the script must not ignore any row start with "#". The file format <IP>TAB<DeviceName>TAB<Description>, I really dont care the third column. Here is an example of the file:

#65.14.169.2 cpr192i00tys cpr192i00tys.tys.bellsouth.net#Ritchie Tractor
65.14.168.242 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.246 cpr191i00tys cpr191i00tys.tys.bellsouth.net#Brother_s_Cove
65.14.169.34 cpr197i00tys cpr197i00tys.tys.bellsouth.net#Precision_Boilers
65.14.169.10 cpr194i00tys cpr194i00tys.tys.bellsouth.net#FDI_Technologies
65.14.169.46 cpr199i00tys cpr199i00tys.tys.bellsouth.net#Woods_Memorial_Hosp
65.14.169.38 cpr198i00tys cpr198i00tys.tys.bellsouth.net#Atwork_Personel


Thanks.



Sponsored Link
Ads by Google

Response Number 1
Name: FishMonger
Date: February 22, 2008 at 10:16:26 Pacific
Reply:

The sample lines you posted don't have any duplicates.

What have you tried?

What portion of the task are you having trouble with?

What error(s) are you receiving?

This sounds like a homework assignment; is it?

If this is not a homework assignment and you haven't tried to write the script, do you know anything about Perl?


0

Response Number 2
Name: ricky007
Date: February 25, 2008 at 06:45:11 Pacific
Reply:

This is not a home work assignmnet. I have identified duplicate via simple sort command but it is not efficient. As I mentioned, I need to run the script via cronjob and email few users when duplicates are found. Here are some of the duplicates I have found: some of the dup also dont have "#" in front of it. Thanks

#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.20.230 cpr133i00mco cpr133i00mco.mco.bellsouth.net#R
#172.17.210.82 cpr078i00mem cpr078i00mem.mem.bellsouth.net
#172.17.253.222 cpr020i00mem cpr020i00mem.mem.bellsouth.net
#172.17.82.142 cpr086i00jax cpr086i00jax.jax.bellsouth.net#Onyx



0

Response Number 3
Name: FishMonger
Date: February 25, 2008 at 08:25:48 Pacific
Reply:

There are several approaches that can be taken and part of the decision depends on your exact requirements.

Based on your requirements in your original post, I'd expect these to be considered duplicates.


65.14.168.242 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.243 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
But your second example would seem to indicate that you're only concerned with duplicate IP's.

The following example test script creates 2 hashes, 1 based on the IP and 1 on the hostnames. It then loops through them and prints out the duplicates.


#!/usr/bin/perl

use strict;
use warnings;

my (%ip, %host);
while (<DATA>) {

if( /^#?([\d.]+)\s+(\S+)/ ) {
my ($ip, $host) = ($1, $2);
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}

print "Extract duplicate IP's which may or may not have duplicate hostnames\n";
foreach my $ip ( keys %ip ) {
if ( @{$ip{$ip}} > 1 ) {
print @{$ip{$ip}};
}
}

print "\nExtract duplicate hostnames which may or may not have duplicate IP's\n";
foreach my $host ( keys %host ) {
if ( @{$host{$host}} > 1 ) {
print @{$host{$host}};
}
}

__DATA__
#65.14.169.2 cpr192i00tys cpr192i00tys.tys.bellsouth.net#Ritchie Tractor
65.14.168.242 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.243 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.246 cpr191i00tys cpr191i00tys.tys.bellsouth.net#Brother_s_Cove
65.14.169.34 cpr197i00tys cpr197i00tys.tys.bellsouth.net#Precision_Boilers
65.14.169.34 cpr196i00tys cpr196i00tys.tys.bellsouth.net#Precision_Boilers
65.14.169.10 cpr194i00tys cpr194i00tys.tys.bellsouth.net#FDI_Technologies
65.14.169.46 cpr199i00tys cpr199i00tys.tys.bellsouth.net#Woods_Memorial_Hosp
65.14.169.38 cpr198i00tys cpr198i00tys.tys.bellsouth.net#Atwork_Personel
#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.20.230 cpr133i00mco cpr133i00mco.mco.bellsouth.net#R
#172.17.210.82 cpr078i00mem cpr078i00mem.mem.bellsouth.net
#172.17.253.222 cpr020i00mem cpr020i00mem.mem.bellsouth.net
#172.17.82.142 cpr086i00jax cpr086i00jax.jax.bellsouth.net#Onyx


Take note on IP's 65.14.168.242 and 65.14.168.243
Would you consider them duplicates because they have the same hostname?
If not, then we could drop the hostname hash and only be concerned with the IP duplicates. Duplicate hostnames with different IP's in the same subnet are common when dealing with Spanning Tree.

0

Response Number 4
Name: ricky007
Date: February 25, 2008 at 08:47:58 Pacific
Reply:

Thank you very much
Yes, the logic need to look for duplicate IP and duplicate device. yes, IP's 65.14.168.242 and 65.14.168.243 also duplicated because they share same device name.


0

Response Number 5
Name: FishMonger
Date: February 25, 2008 at 09:22:11 Pacific
Reply:

Ok, lets refine the test script a little. Currently it has some duplication of it's own that you may or may not want. If you run the script. you'll see that several lines are duplicated across both hashes. Here's an adjusted version that consolidates the duplications.


#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my (%ip, %host, @duplicates);
while (<DATA>) {

if( /^#?([\d.]+)\s+(\S+)/ ) {
my ($ip, $host) = ($1, $2);
if ( defined $ip{$1} or defined $host{$2} ) {
chomp;
push @duplicates, $_;
}
else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}

print Dumper \@duplicates;
exit;
# I left out the DATA section just to keep this post short, but it's in the script


This is what it should output:

$VAR1 = [
'65.14.168.243 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics',
'65.14.169.34 cpr196i00tys cpr196i00tys.tys.bellsouth.net#Precision_Boilers',
'#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal',
'#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT'
];


0

Related Posts

See More



Response Number 6
Name: ricky007
Date: February 25, 2008 at 09:57:20 Pacific
Reply:

I ran against my input file but getting an error msg: my input file has 24951 lines

$ ./CheckDupDIP host1
Name "main::DATA" used only once: possible typo at ./CheckDupDIP line 6.
readline() on closed filehandle main::DATA at ./CheckDupDIP line 6.
$VAR1 = [];


Here is the script file:

$ cat CheckDupDIP
#!/opt/sa/bin/perl
use strict;
use warnings;
use Data::Dumper;
my (%ip, %host, @duplicates);
while (<DATA>) {
if( /^#?([\d.]+)\s+(\S+)/ ) {
my ($ip, $host) = ($1, $2);
if ( defined $ip{$1} or defined $host{$2} ) {
chomp;
push @duplicates, $_;
} else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}
print Dumper \@duplicates;
exit;


0

Response Number 7
Name: ricky007
Date: February 25, 2008 at 10:33:41 Pacific
Reply:

How do I make this script runs agains /etc/hosts every day midnight via cron and then e-mail team members with output file.

Thank you very much for your help on this.


0

Response Number 8
Name: FishMonger
Date: February 25, 2008 at 10:34:49 Pacific
Reply:

It looks like you missed reading this line:

# I left out the DATA section just to keep this post short, but it's in the script

Here's an adjusted version that reads your file instead of hard coding the example lines in the script.


#!/opt/sa/bin/perl

use strict;
use warnings;
use Data::Dumper;

my (%ip, %host, @duplicates);
my $host_file = '/etc/host1'; # change path and filename as needed

open my $file, '<', $host_file or die "can't open $host_file $!";
while (<$file>) {

if( my ($ip, $host) = /^#?([\d.]+)\s+(\S+)/ ) {
if ( defined $ip{$ip} or defined $host{$host} ) {
chomp;
push @duplicates, $_;
}
else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}
close $file;

print Dumper \@duplicates;
exit;


0

Response Number 9
Name: FishMonger
Date: February 25, 2008 at 10:46:08 Pacific
Reply:

The man page (section 5) for crontab will give you the proper syntax.

man crontab -s5

I was holding off showing the email code until we finalized the details of the parsing, but if you want I can add that in now.


0

Response Number 10
Name: FishMonger
Date: February 25, 2008 at 11:06:06 Pacific
Reply:


#!/opt/sa/bin/perl

use strict;
use warnings;
use MIME::Lite;

my (%ip, %host, $duplicates);
my $host_file = 'host1.txt'; # change path and filename as needed

open my $file, '<', $host_file or die "can't open $host_file $!";
while (<$file>) {

if( my ($ip, $host) = /^#?([\d.]+)\s+(\S+)/ ) {
if ( defined $ip{$ip} or defined $host{$host} ) {
$duplicates .= $_;
}
else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}
close $file;

my $email_msg = <<EMAIL_MSG;
The following entries in the host file are dulpicates
either by IP address or by hostname.

$duplicates

EMAIL_MSG

my $email = MIME::Lite->new(
From => 'me@myhost.com',
To => 'you@yourhost.com',
Cc => 'some@other.com, some@more.com',
Subject => 'Host file duplicates',
Data => $email_msg
);
$email->send;



0

Response Number 11
Name: ricky007
Date: February 25, 2008 at 11:14:14 Pacific
Reply:

Supreb! you are genius, man!
It worked against my the hostfile. Now, I am checking the e-mail porttion. I will figure out the cron option. Thanks.


0

Response Number 12
Name: ricky007
Date: February 25, 2008 at 11:38:20 Pacific
Reply:

getting an compilation error: $ ./CheckDupDIP
Global symbol "$duplicates" requires explicit package name at ./CheckDupDIP line 21.
Execution of ./CheckDupDIP aborted due to compilation errors.



0

Response Number 13
Name: ricky007
Date: February 25, 2008 at 12:24:29 Pacific
Reply:

I have added in the beginning use Data::Dumper; and
replaced "print @duplicates"
wiht print Dumper \@duplicates

I am getting e-mails but please let me know if I am in right track.


0

Response Number 14
Name: ricky007
Date: February 25, 2008 at 12:43:26 Pacific
Reply:

It is working now.. Thanks a lot for helping on this.


0

Response Number 15
Name: FishMonger
Date: February 25, 2008 at 15:26:55 Pacific
Reply:

If you want, we can reduce its memory usage, which should also make it more efficient, by using a counter instead of the HoA (hash of arrays).

Change the else clause to this:


else {
$ip{$ip}++;
$host{$host}++;
}

Another change, which would need to be benchmarked, to see if it would be more efficient would be to use the split() function instead of the regex. However, with this small and simple of a file, we'd only be talking about fractions of a second.

0

Response Number 16
Name: FishMonger
Date: February 25, 2008 at 15:34:01 Pacific
Reply:

And we can reduce the memory usage further by using only 1 hash instead of 2.


while (<$file>) {

if( my ($ip, $host) = /^#?([\d.]+)\s+(\S+)/ ) {
if ( defined $ip{$ip} or defined $ip{$host} ) {
$duplicates .= $_;
}
else {
$ip{$ip}++;
$ip{$host}++;
}
}
}



0

Response Number 17
Name: ricky007
Date: February 26, 2008 at 06:46:48 Pacific
Reply:

Thank you.
I need one more big favor, instead the output showing both IP and host, I would like to see duplicate IP only and then duplicate host only. Thanks.


0

Response Number 18
Name: FishMonger
Date: February 26, 2008 at 13:04:26 Pacific
Reply:

Your answer is in the first example test script that I posted. Post back if you can't figure out how to extract the data for the email.


0

Sponsored Link
Ads by Google
Reply to Message Icon






Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home


Sponsored links

Ads by Google


Results for: Find duplicate value and create an

Find duplicate value & create outpu www.computing.net/answers/unix/find-duplicate-value-amp-create-outpu/8049.html

Find duplicate value comparing 2 fi www.computing.net/answers/unix/find-duplicate-value-comparing-2-fi/8047.html

Need help creating an output files www.computing.net/answers/unix/need-help-creating-an-output-files/7902.html