comparing two files

Microsoft Windows server 2003
December 3, 2009 at 17:13:15
Specs: Windows XP
first i need to copy the contents of file 1 to file 2. then after a delay of 5 mins (during this time file 1 will be filling with new data), i need to:

compare two files and get the difference between the two files. F1 is the base file.I need to find all the lines missing in F2,overwrite F2 with those missing lines, then again wait for 5 mins and then again do the comparing and push unmatched data to F2, overwriting current data in F2. want to create a perl script for this on windows 2003


See More: comparing two files

Report •


#1
December 7, 2009 at 23:34:14
how does file1 fill: sequentially? (all new data appended to eof
on a text-line basis) or random(based on data contents)? if sequential, just get linecount of file1 at time of copy, then (after 5 minutes) go to line and copy rest to file2. if random (and text-based) need to sort both files and do some further analysis. Also, if you just want the newest (5-minute cycle) data into file2 and not any previous data that was in file2 from file1, you might need a third temp file to hold all incrementals to file1.

Report •

#2
December 8, 2009 at 11:08:10
thanks a lot for the support.

the file1 fills sequentially, new lines append to EOF.

i want the script, exactly in the fashion you have understood.

below is a script that i created and is showing the differences in the diffs.txt.

but there are some problems with it:

now i want the lines that got into the diffs.txt to be appended to the sidnew.txt, so that next time when we run the script, the comparison should be b/w the latest differences, it should not again put the diffs that it found in the first run to the diff.txt file.

also sir, the script, if run the second time, put the differences in the diff.txt, but removes the first line that existed in the diff.txt and duplicates it with the last line that it found uncommon.

for example.

say sid.txt has data:
a
b
c
d

and say sidnew.txt has data:

a
b

when the script is run, output of sidnew and diff file is:

dc

PROBLEM 1. (the output is recursive, and is not coming in new line).

when the script is run the second time, after adding some new (since sid.txt is syslog, that would be updating continously, directly from the device) :

say now sid.txt is:

a
b
c
d
e
f
g
h

and sidnew.txt is the one that got updated with the first run of the script:

dc

now if i run the script, the output is:

sidnew.txt:

g
dc
e
c
f
b
ha
d

and diff.txt is :

g
dc
e
c
f
b
ha
d

but, the diff.txt should be
e
f
g
h
and i want, that after we have the diffs in the diff.txt, these diffs should append to sidnew.txt:

and sidnew.txt should look like:

a
b
c
d
e
f
g
h

so, that next time the script runs, it should compare the latest differences, (as between the running of script this time the syslog will be filled with new lines).

so, dont know why the sidnew,txt is ahowing jumbled and is taking :

g
dc
e
c
f
b
ha
d

but it should take e f g h, and not abcd againg, as they were matching when the script ran for the first time.

so , i guess either we need to append the diffs in sidnew.txt. because we cannot do anything (say remove the matching lines) with sid.txt (syslog), as it is used for other purpose also.

please HELP me on this sir.

below is the script that i am running:


Regards,


use strict;

use warnings;



my $f1 = 'c:\sid.txt';

my $f2 = 'c:\sidnew.txt';

my $outfile = 'c:\diff.txt';


my %results = ();



open FILE1, "$f1" or die "Could not open file: $! \n";

while(my $line = <FILE1>){

$results{$line}=1;

}

close(FILE1);



open FILE2, "$f2" or die "Could not open file: $! \n";

while(my $line =<FILE2>) {

$results{$line}++;

}

close(FILE2);





open (OUTFILE, ">$f2") or die "Cannot open $outfile for writing \n";


Report •

#3
December 8, 2009 at 20:59:12
i'm not real sure if this is your objective:

@echo off && setlocal enabledelayedexpansion
set /a Lc=0
:aa
set /a ct=0
echo off > sid2
for /f "tokens=* delims=" %%a in (sid1) do (
set /a ct+=1
if !ct! gtr !Lc! echo %%a >> sid2)
set Lc=!ct!

:: this was just my testing arrangmt, i assume you already
:: have a 5-minute timer working?
:echo "waiting 5 min.s..."
:wait 3000
:pause
goto :aa

if batchfile is or needs to be killed between the 5-minute
interrupts, you need to make sure that var. lc is not set
local, you want it to persist. disable or delete the line after
@echo (set /a Lc=0) so Lc will pick up where it left off.

i know there's more to come... awaiting developments...


Report •

Related Solutions

#4
December 11, 2009 at 10:03:44
Hi,

i tried to implement the code, you stated, but could not tweak it.

could you please modify the script, i had posted you, or help me getting the code right, so that the output that i want comes in a correct fashion.
i appreciate your support.


Report •

#5
December 11, 2009 at 18:52:05
I see you have Perl. did you search CPAN for a file diff module ? why don't you try Text::Diff. It can compare difference between 2 files.

GNU win32 packages | Gawk


Report •

#6
December 11, 2009 at 19:00:49
ok, try this one. it includes a timer setup using "at". I left
plenty of debugging dialog in place so if it bombs (probably
will), you can access some of the settings. to start the
batchfile using AT from cmdprompt you have to be very explicit (include full path to batch and ".bat" extension)
at 9:30 c:\batpath\timer.bat
and make sure the time you give is at least one minute more
than the current system time otherwise it gets scheduled for
"tomorrow"

@echo off && setlocal enabledelayedexpansion
:: set all defaults, adapt as needed
set mpath=C:\workdir
set tpath=c:\workdir
set batpath=c:\workdir
set logpath=c:\workdir
set main=sid.txt
set test=sidnew.txt
set log=%logpath%\timer.log
::set log=nul --- enable for null logging
set syslog=%mpath%\%main%
set sysnew=%tpath%\%test%
set bat=%batpath%\timer.bat
set /a cycle=%2
set /a lc=%1
if "%2"=="" set /a cycle=5
if "%1"=="" (
set /a lc=0
set /a cycle=5
echo usage: AT hh:mm path\TIMER.BAT [init_lineskip] [cycle]
echo where init_lineskip is the no. of lines to skip first time, default=0
echo cycle is a number of minutes to wait between filechecks, default=5
echo booting with defaults! ctrl-C to abort bootup...
pause
)
set /a sk=%lc%
for /f %%b in ('find /c /v "" ^< %syslog%') do set lc=%%b
::all this is mainly for debugging. it can be taken out.
::or just make var. log = nul up at top (set log=nul)
echo 1 is %1 >> !log!
echo sk is %sk% >> !log!
echo system log file: !syslog! >> !log!
echo newlines file: !sysnew! >> !log!
echo cycle: !cycle! >> !log!
echo nextcycle lineskip is !lc! >> !log!
::erase newlines file each cycle
echo off > !sysnew!
for /f "skip=%sk% tokens=* delims=" %%a in (%syslog%) do echo %%a >> !sysnew!
:: all the rest of this handles resetting the next cycle (adds
::5 minutes to the current time and resets "AT"
::if you don't have task scheduling enabled, or if you have your
::own timer arrangement, just remove all this part.
set tt=%time%
for /f "tokens=1-2 delims=:" %%c in ("!tt!") do (
set hh=%%c
set mm=%%d
)
echo !tt! >> !log!
if %hh:~0,1% equ 0 set hh=%hh:~1%
if %mm:~0,1% equ 0 set mm=%mm:~1%
echo !mm! >> !log!
set /a mm+=!cycle!
echo !mm! >> !log!
if !mm! gtr 59 (
set mm-=60
set hh+=1
if hh gtr 24 set /a hh=0
)
set newtime=!hh!:!mm!
echo next time is !newtime! >> !log!
at !newtime! !bat!.bat !lc! !cycle!
:ex


Report •


Ask Question