compare content of 2 files

January 5, 2010 at 23:16:54
Specs: unix
can someone help me, how to know different between two file, using awk, grep or perl
example :
content of file a:
a
b
1
c
d
content of file b
1
4
5
c
a
output is should be in file c
a in both file
b only in file a
1 in both file
c in both file
d only in file a
4 only in file b
5 only in file b


See More: compare content of 2 files

Report •

#1
January 6, 2010 at 08:22:52
Isn't the easiest way to sort them and then use diff on the two sorted files?

Report •

#2
January 7, 2010 at 01:51:35
diff only can compare per line between 2 file, so even we already doing sort, still have mistake in output

Report •

#3
January 7, 2010 at 01:53:42
"diff can only compare per line" - aren't your input files also one
item per line? That's what your example shows.

Report •

Related Solutions

#4
January 7, 2010 at 02:06:38
a a
b c
c 1
d 4
1 5

a compare to a
b compare to c
etc..
what I expected is c should be in both file, but using sort and diff have different output


Report •

#5
January 7, 2010 at 04:13:25
I see what you mean. However, if the files are similar enough, diff will detect (using your example) that b in file 1 is actually an inserted line, and that line 3 in file 1 is identical to line 2 in file 2. It will then proceed to compare file 1 line 4 with file 2 line 3, and so on.

So, diff is not strictly line-by-line, it looks at a few lines above and below the current line of each file, and tries to match lines that are not necessarily the same line number in each file.


Report •

#6
January 7, 2010 at 08:50:14
In any case, even if diff works, I agree it's better to use something like awk. You may be able to read all lines of the first file into an awk associative array, using the whole line as the array key, and then read the second file and check each line if it exists in the array. If it exists, report it as being in both files, and delete it from the array. If it doesn't exist, report it as being in file 2 only. At the end, you will end up with an array containing those lines that are only in file 1. Sorry I haven't got time to write the code.

Report •

#7
January 7, 2010 at 23:34:03
Ok, no problem, anyway thanks for your answer. maybe other people in this forum can help me.


Report •

#8
January 8, 2010 at 11:29:29
about how big are (or will be) the files?
do they contain blank lines? do they contain exclm marks?
for small files, klint's approach is excellent.
i'm not good at reg.expressions, so i'll put this attempted solution in batch.

@echo off & setlocal enabledelayedexpansion
set both= in both files
set f2=file2
set f1=file1
call :cmpare
endlocal
setlocal enabledelayedexpansion
set f2=file1
set f1=file2
call :cmpare
)
goto :eof

:cmpare
for /f "tokens=* delims=" %%a in (!f1!) do (
set xx=%%a
set !xx!=1
)
for /f "tokens=* delims=" %%b in (!f2!) do (
set yy=%%b
if defined !yy! (set out =!both!) else (set out=in !f2! only)
if "!out!" neq "" >>results echo.!yy! !out!
)


Report •

#9
January 8, 2010 at 14:50:16
@nbrane: I think you haven't spotted the bit at the start of this
thread which said OS: unix.

Report •

#10
January 9, 2010 at 03:50:30
yes, I am using UNIX Solaris, the file will contain approx about 500 lines. I appreciate your answer.

Report •

#11
January 9, 2010 at 05:47:11
I was hoping ghostdog would spot this post, he's a real wizzard with Unix text utilities such as awk. I have used it so long ago it would take me some time before I get to grips with it again.

Report •

#12
January 15, 2010 at 17:58:04
Hi,
can someone help me with scripts ?


Report •

#13
January 15, 2010 at 18:23:39
unix/linux? or microsoft (windows)?
(i took a gander at MAN BASH the other day, whew, 5000+ lines, only abt 9 pages, but "high-density")

Report •

#14
January 17, 2010 at 05:00:28
Can't get it going ... below would read one file, but then you want to match each line of file 1, against each line of file 2, and that is where it fails:


#!/bin/sh

if [ "$2" = "" ]
then
echo
echo "USAGE : $0 inputfile inputfile"
echo
exit 1
fi

infile1=$1
infile2=$2

innerloop()
{
echo x
}

innerloop

awk '
{
while (1)
{
print
line1=$0
innerloop
if (getline <= 0) break
}
}' $infile1


Report •

#15
January 17, 2010 at 17:32:54
i ran across this, seems to do exactly what you want (on Debian Linux server, files must be sorted first):
COMM(1) User commands
NAME
comm - compare two sorted files line by line

SYNOPSIS
comm [OPTION]... FILE1 FILE2

DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line.

With no options, produce three-column output.
Column one contains lines unique to FILE1,
column two contains lines unique to FILE2,
and column three contains lines common to both files.

Report •

Ask Question