Name: ricky007 Date: February 22, 2008 at 08:16:24 Pacific Subject: Find duplicate value and create an OS: solaris CPU/Ram: solaris Model/Manufacturer: 10
Comment:
I need a perl script, which will run every midnight via cronjob and e-mail few users once it finds any duplicated value in a file which is located /etc/hosts, the file name is called hosts and the format of the file has 3 colums and some time 2 columns. The script will look for duplicate IP or Duplicate device name, the script must not ignore any row start with "#". The file format <IP>TAB<DeviceName>TAB<Description>, I really dont care the third column. Here is an example of the file:
This is not a home work assignmnet. I have identified duplicate via simple sort command but it is not efficient. As I mentioned, I need to run the script via cronjob and email few users when duplicates are found. Here are some of the duplicates I have found: some of the dup also dont have "#" in front of it. Thanks
But your second example would seem to indicate that you're only concerned with duplicate IP's.
The following example test script creates 2 hashes, 1 based on the IP and 1 on the hostnames. It then loops through them and prints out the duplicates.
print "Extract duplicate IP's which may or may not have duplicate hostnames\n"; foreach my $ip ( keys %ip ) { if ( @{$ip{$ip}} > 1 ) { print @{$ip{$ip}}; } }
print "\nExtract duplicate hostnames which may or may not have duplicate IP's\n"; foreach my $host ( keys %host ) { if ( @{$host{$host}} > 1 ) { print @{$host{$host}}; } }
Take note on IP's 65.14.168.242 and 65.14.168.243 Would you consider them duplicates because they have the same hostname? If not, then we could drop the hostname hash and only be concerned with the IP duplicates. Duplicate hostnames with different IP's in the same subnet are common when dealing with Spanning Tree.
Thank you very much Yes, the logic need to look for duplicate IP and duplicate device. yes, IP's 65.14.168.242 and 65.14.168.243 also duplicated because they share same device name.
Ok, lets refine the test script a little. Currently it has some duplication of it's own that you may or may not want. If you run the script. you'll see that several lines are duplicated across both hashes. Here's an adjusted version that consolidates the duplications.
#!/usr/bin/perl
use strict; use warnings; use Data::Dumper;
my (%ip, %host, @duplicates); while (<DATA>) {
if( /^#?([\d.]+)\s+(\S+)/ ) { my ($ip, $host) = ($1, $2); if ( defined $ip{$1} or defined $host{$2} ) { chomp; push @duplicates, $_; } else { push @{$ip{$ip}}, $_; push @{$host{$host}}, $_; } } }
print Dumper \@duplicates; exit; # I left out the DATA section just to keep this post short, but it's in the script
I ran against my input file but getting an error msg: my input file has 24951 lines
$ ./CheckDupDIP host1 Name "main::DATA" used only once: possible typo at ./CheckDupDIP line 6. readline() on closed filehandle main::DATA at ./CheckDupDIP line 6. $VAR1 = [];
Here is the script file:
$ cat CheckDupDIP #!/opt/sa/bin/perl use strict; use warnings; use Data::Dumper; my (%ip, %host, @duplicates); while (<DATA>) { if( /^#?([\d.]+)\s+(\S+)/ ) { my ($ip, $host) = ($1, $2); if ( defined $ip{$1} or defined $host{$2} ) { chomp; push @duplicates, $_; } else { push @{$ip{$ip}}, $_; push @{$host{$host}}, $_; } } } print Dumper \@duplicates; exit;
my (%ip, %host, $duplicates); my $host_file = 'host1.txt'; # change path and filename as needed
open my $file, '<', $host_file or die "can't open $host_file $!"; while (<$file>) {
if( my ($ip, $host) = /^#?([\d.]+)\s+(\S+)/ ) { if ( defined $ip{$ip} or defined $host{$host} ) { $duplicates .= $_; } else { push @{$ip{$ip}}, $_; push @{$host{$host}}, $_; } } } close $file;
my $email_msg = <<EMAIL_MSG; The following entries in the host file are dulpicates either by IP address or by hostname.
$duplicates
EMAIL_MSG
my $email = MIME::Lite->new( From => 'me@myhost.com', To => 'you@yourhost.com', Cc => 'some@other.com, some@more.com', Subject => 'Host file duplicates', Data => $email_msg ); $email->send;
getting an compilation error: $ ./CheckDupDIP Global symbol "$duplicates" requires explicit package name at ./CheckDupDIP line 21. Execution of ./CheckDupDIP aborted due to compilation errors.
If you want, we can reduce its memory usage, which should also make it more efficient, by using a counter instead of the HoA (hash of arrays).
Change the else clause to this:
else { $ip{$ip}++; $host{$host}++; }
Another change, which would need to be benchmarked, to see if it would be more efficient would be to use the split() function instead of the regex. However, with this small and simple of a file, we'd only be talking about fractions of a second.
Thank you. I need one more big favor, instead the output showing both IP and host, I would like to see duplicate IP only and then duplicate host only. Thanks.
The information on Computing.Net is the opinions of its users. Such
opinions may not be accurate and they are to be used at your own risk.
Computing.Net cannot verify the validity of the statements made on this site. Computing.Net and Computing.Net, LLC hereby disclaim all responsibility and liability for the content of Computing.Net and its accuracy.
PLEASE READ THE FULL DISCLAIMER AND LEGAL TERMS BY CLICKING HERE