Specialty Forums
Security and Virus
General Hardware
CPUs/Overclocking
Networking
Digital Photo/Video
Office Software
PC Gaming
Console Gaming
Programming
Database
Web Development
Digital Home

General Forums
Windows XP
Windows Vista
Windows 95/98
Windows Me
Windows NT
Windows 2000
Win Server 2008
Win Server 2003
Windows 3.1
Linux
PDAs
BeOS
Novell Netware
OpenVMS
Solaris
Disk Op. System
Unix
Mac
OS/2

Drivers
Driver Scan
Driver Forum

Software
Automatic Updates

BIOS Updates

My Computing.Net

Solution Center

Free IT eBook

Howtos

Site Search

Message Find

RSS Feeds

Install Guides

Data Recovery

About

Home
Reply to Message Icon Go to Main Page Icon

Find duplicate value and create an

Original Message
Name: ricky007
Date: February 22, 2008 at 08:16:24 Pacific
Subject: Find duplicate value and create an
OS: solaris
CPU/Ram: solaris
Model/Manufacturer: 10
Comment:
I need a perl script, which will run every midnight via cronjob and e-mail few users once it finds any duplicated value in a file which is located /etc/hosts, the file name is called hosts and the format of the file has 3 colums and some time 2 columns. The script will look for duplicate IP or Duplicate device name, the script must not ignore any row start with "#". The file format <IP>TAB<DeviceName>TAB<Description>, I really dont care the third column. Here is an example of the file:

#65.14.169.2 cpr192i00tys cpr192i00tys.tys.bellsouth.net#Ritchie Tractor
65.14.168.242 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.246 cpr191i00tys cpr191i00tys.tys.bellsouth.net#Brother_s_Cove
65.14.169.34 cpr197i00tys cpr197i00tys.tys.bellsouth.net#Precision_Boilers
65.14.169.10 cpr194i00tys cpr194i00tys.tys.bellsouth.net#FDI_Technologies
65.14.169.46 cpr199i00tys cpr199i00tys.tys.bellsouth.net#Woods_Memorial_Hosp
65.14.169.38 cpr198i00tys cpr198i00tys.tys.bellsouth.net#Atwork_Personel


Thanks.


Report Offensive Message For Removal


Response Number 1
Name: FishMonger
Date: February 22, 2008 at 10:16:26 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
The sample lines you posted don't have any duplicates.

What have you tried?

What portion of the task are you having trouble with?

What error(s) are you receiving?

This sounds like a homework assignment; is it?

If this is not a homework assignment and you haven't tried to write the script, do you know anything about Perl?


Report Offensive Follow Up For Removal

Response Number 2
Name: ricky007
Date: February 25, 2008 at 06:45:11 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
This is not a home work assignmnet. I have identified duplicate via simple sort command but it is not efficient. As I mentioned, I need to run the script via cronjob and email few users when duplicates are found. Here are some of the duplicates I have found: some of the dup also dont have "#" in front of it. Thanks

#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.20.230 cpr133i00mco cpr133i00mco.mco.bellsouth.net#R
#172.17.210.82 cpr078i00mem cpr078i00mem.mem.bellsouth.net
#172.17.253.222 cpr020i00mem cpr020i00mem.mem.bellsouth.net
#172.17.82.142 cpr086i00jax cpr086i00jax.jax.bellsouth.net#Onyx



Report Offensive Follow Up For Removal

Response Number 3
Name: FishMonger
Date: February 25, 2008 at 08:25:48 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
There are several approaches that can be taken and part of the decision depends on your exact requirements.

Based on your requirements in your original post, I'd expect these to be considered duplicates.


65.14.168.242 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.243 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
But your second example would seem to indicate that you're only concerned with duplicate IP's.

The following example test script creates 2 hashes, 1 based on the IP and 1 on the hostnames. It then loops through them and prints out the duplicates.


#!/usr/bin/perl

use strict;
use warnings;

my (%ip, %host);
while (<DATA>) {

if( /^#?([\d.]+)\s+(\S+)/ ) {
my ($ip, $host) = ($1, $2);
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}

print "Extract duplicate IP's which may or may not have duplicate hostnames\n";
foreach my $ip ( keys %ip ) {
if ( @{$ip{$ip}} > 1 ) {
print @{$ip{$ip}};
}
}

print "\nExtract duplicate hostnames which may or may not have duplicate IP's\n";
foreach my $host ( keys %host ) {
if ( @{$host{$host}} > 1 ) {
print @{$host{$host}};
}
}

__DATA__
#65.14.169.2 cpr192i00tys cpr192i00tys.tys.bellsouth.net#Ritchie Tractor
65.14.168.242 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.243 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics
65.14.168.246 cpr191i00tys cpr191i00tys.tys.bellsouth.net#Brother_s_Cove
65.14.169.34 cpr197i00tys cpr197i00tys.tys.bellsouth.net#Precision_Boilers
65.14.169.34 cpr196i00tys cpr196i00tys.tys.bellsouth.net#Precision_Boilers
65.14.169.10 cpr194i00tys cpr194i00tys.tys.bellsouth.net#FDI_Technologies
65.14.169.46 cpr199i00tys cpr199i00tys.tys.bellsouth.net#Woods_Memorial_Hosp
65.14.169.38 cpr198i00tys cpr198i00tys.tys.bellsouth.net#Atwork_Personel
#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT
#172.17.20.230 cpr133i00mco cpr133i00mco.mco.bellsouth.net#R
#172.17.210.82 cpr078i00mem cpr078i00mem.mem.bellsouth.net
#172.17.253.222 cpr020i00mem cpr020i00mem.mem.bellsouth.net
#172.17.82.142 cpr086i00jax cpr086i00jax.jax.bellsouth.net#Onyx


Take note on IP's 65.14.168.242 and 65.14.168.243
Would you consider them duplicates because they have the same hostname?
If not, then we could drop the hostname hash and only be concerned with the IP duplicates. Duplicate hostnames with different IP's in the same subnet are common when dealing with Spanning Tree.

Report Offensive Follow Up For Removal

Response Number 4
Name: ricky007
Date: February 25, 2008 at 08:47:58 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
Thank you very much
Yes, the logic need to look for duplicate IP and duplicate device. yes, IP's 65.14.168.242 and 65.14.168.243 also duplicated because they share same device name.

Report Offensive Follow Up For Removal

Response Number 5
Name: FishMonger
Date: February 25, 2008 at 09:22:11 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
Ok, lets refine the test script a little. Currently it has some duplication of it's own that you may or may not want. If you run the script. you'll see that several lines are duplicated across both hashes. Here's an adjusted version that consolidates the duplications.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my (%ip, %host, @duplicates);
while (<DATA>) {

if( /^#?([\d.]+)\s+(\S+)/ ) {
my ($ip, $host) = ($1, $2);
if ( defined $ip{$1} or defined $host{$2} ) {
chomp;
push @duplicates, $_;
}
else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}

print Dumper \@duplicates;
exit;
# I left out the DATA section just to keep this post short, but it's in the script


This is what it should output:

$VAR1 = [
'65.14.168.243 cpr195i00tys cpr195i00tys.tys.bellsouth.net#Knoxville_Pediatrics',
'65.14.169.34 cpr196i00tys cpr196i00tys.tys.bellsouth.net#Precision_Boilers',
'#172.17.147.46 cpr065i00bna cpr065i00bna.bna.bellsouth.net#Norandal',
'#172.17.149.102 cpr146i00bna cpr146i00bna.bna.bellsouth.net#EFT'
];


Report Offensive Follow Up For Removal


Response Number 6
Name: ricky007
Date: February 25, 2008 at 09:57:20 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
I ran against my input file but getting an error msg: my input file has 24951 lines

$ ./CheckDupDIP host1
Name "main::DATA" used only once: possible typo at ./CheckDupDIP line 6.
readline() on closed filehandle main::DATA at ./CheckDupDIP line 6.
$VAR1 = [];


Here is the script file:

$ cat CheckDupDIP
#!/opt/sa/bin/perl
use strict;
use warnings;
use Data::Dumper;
my (%ip, %host, @duplicates);
while (<DATA>) {
if( /^#?([\d.]+)\s+(\S+)/ ) {
my ($ip, $host) = ($1, $2);
if ( defined $ip{$1} or defined $host{$2} ) {
chomp;
push @duplicates, $_;
} else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}
print Dumper \@duplicates;
exit;


Report Offensive Follow Up For Removal

Response Number 7
Name: ricky007
Date: February 25, 2008 at 10:33:41 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
How do I make this script runs agains /etc/hosts every day midnight via cron and then e-mail team members with output file.

Thank you very much for your help on this.


Report Offensive Follow Up For Removal

Response Number 8
Name: FishMonger
Date: February 25, 2008 at 10:34:49 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
It looks like you missed reading this line:
# I left out the DATA section just to keep this post short, but it's in the script

Here's an adjusted version that reads your file instead of hard coding the example lines in the script.


#!/opt/sa/bin/perl

use strict;
use warnings;
use Data::Dumper;

my (%ip, %host, @duplicates);
my $host_file = '/etc/host1'; # change path and filename as needed

open my $file, '<', $host_file or die "can't open $host_file $!";
while (<$file>) {

if( my ($ip, $host) = /^#?([\d.]+)\s+(\S+)/ ) {
if ( defined $ip{$ip} or defined $host{$host} ) {
chomp;
push @duplicates, $_;
}
else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}
close $file;

print Dumper \@duplicates;
exit;


Report Offensive Follow Up For Removal

Response Number 9
Name: FishMonger
Date: February 25, 2008 at 10:46:08 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
The man page (section 5) for crontab will give you the proper syntax.

man crontab -s5

I was holding off showing the email code until we finalized the details of the parsing, but if you want I can add that in now.


Report Offensive Follow Up For Removal

Response Number 10
Name: FishMonger
Date: February 25, 2008 at 11:06:06 Pacific
Subject: Find duplicate value and create an
Reply: (edit)

#!/opt/sa/bin/perl

use strict;
use warnings;
use MIME::Lite;

my (%ip, %host, $duplicates);
my $host_file = 'host1.txt'; # change path and filename as needed

open my $file, '<', $host_file or die "can't open $host_file $!";
while (<$file>) {

if( my ($ip, $host) = /^#?([\d.]+)\s+(\S+)/ ) {
if ( defined $ip{$ip} or defined $host{$host} ) {
$duplicates .= $_;
}
else {
push @{$ip{$ip}}, $_;
push @{$host{$host}}, $_;
}
}
}
close $file;

my $email_msg = <<EMAIL_MSG;
The following entries in the host file are dulpicates
either by IP address or by hostname.

$duplicates

EMAIL_MSG

my $email = MIME::Lite->new(
From => 'me@myhost.com',
To => 'you@yourhost.com',
Cc => 'some@other.com, some@more.com',
Subject => 'Host file duplicates',
Data => $email_msg
);
$email->send;



Report Offensive Follow Up For Removal

Response Number 11
Name: ricky007
Date: February 25, 2008 at 11:14:14 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
Supreb! you are genius, man!
It worked against my the hostfile. Now, I am checking the e-mail porttion. I will figure out the cron option. Thanks.

Report Offensive Follow Up For Removal

Response Number 12
Name: ricky007
Date: February 25, 2008 at 11:38:20 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
getting an compilation error: $ ./CheckDupDIP
Global symbol "$duplicates" requires explicit package name at ./CheckDupDIP line 21.
Execution of ./CheckDupDIP aborted due to compilation errors.



Report Offensive Follow Up For Removal

Response Number 13
Name: ricky007
Date: February 25, 2008 at 12:24:29 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
I have added in the beginning use Data::Dumper; and
replaced "print @duplicates"
wiht print Dumper \@duplicates

I am getting e-mails but please let me know if I am in right track.


Report Offensive Follow Up For Removal

Response Number 14
Name: ricky007
Date: February 25, 2008 at 12:43:26 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
It is working now.. Thanks a lot for helping on this.

Report Offensive Follow Up For Removal

Response Number 15
Name: FishMonger
Date: February 25, 2008 at 15:26:55 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
If you want, we can reduce its memory usage, which should also make it more efficient, by using a counter instead of the HoA (hash of arrays).

Change the else clause to this:


else {
$ip{$ip}++;
$host{$host}++;
}

Another change, which would need to be benchmarked, to see if it would be more efficient would be to use the split() function instead of the regex. However, with this small and simple of a file, we'd only be talking about fractions of a second.

Report Offensive Follow Up For Removal

Response Number 16
Name: FishMonger
Date: February 25, 2008 at 15:34:01 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
And we can reduce the memory usage further by using only 1 hash instead of 2.

while (<$file>) {

if( my ($ip, $host) = /^#?([\d.]+)\s+(\S+)/ ) {
if ( defined $ip{$ip} or defined $ip{$host} ) {
$duplicates .= $_;
}
else {
$ip{$ip}++;
$ip{$host}++;
}
}
}



Report Offensive Follow Up For Removal

Response Number 17
Name: ricky007
Date: February 26, 2008 at 06:46:48 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
Thank you.
I need one more big favor, instead the output showing both IP and host, I would like to see duplicate IP only and then duplicate host only. Thanks.

Report Offensive Follow Up For Removal

Response Number 18
Name: FishMonger
Date: February 26, 2008 at 13:04:26 Pacific
Subject: Find duplicate value and create an
Reply: (edit)
Your answer is in the first example test script that I posted. Post back if you can't figure out how to extract the data for the email.

Report Offensive Follow Up For Removal



Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Find duplicate value and create an 

Comments:

 
  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 


Data Recovery Software




acer 312T BIOS problem

K7 Turbo possible max fsb?

Pc anywher problem

WinFLP & OE/Outlook2003

Computer resets after a few minutes


The information on Computing.Net is the opinions of its users. Such opinions may not be accurate and they are to be used at your own risk. Computing.Net cannot verify the validity of the statements made on this site. Computing.Net and Computing.Net, LLC hereby disclaim all responsibility and liability for the content of Computing.Net and its accuracy.
PLEASE READ THE FULL DISCLAIMER AND LEGAL TERMS BY CLICKING HERE

All content ©1996-2007 Computing.Net, LLC