Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi,
I have a file with duplicate records in it. I want to identify the duplicate records and put them into a separate file. Please let me know how to do it.
Please note that each record is 2 lines in length and has spaces in each record.
Thanks,
Anil

I see no clear method of doing this. My idea is to place each 2-line record into a file, compare the files and delete the duplicates. Here is a 3 step kludge:
1) Create a mytmp directory in the directory where the file resides. Then use the unix split command to break apart the file into 2-line files:
split -l 2 database.txt mytmp/m1
all the file names will start with "m1".
2) In ksh, find the unique check sum, using cksum, of each file. sort the file and any file that's unique, delete it:
#!/bin/ksh
fr=0
cksum $(find mytmp -type f -name "m1*" -print)|sort -k 1,1n |
while read c1 c2 n3
do
if [ fr -eq 0 ]; then
prevobj=$c1
fr=1
continue
fi
# save check sums and file sizes that are duplicate
if [[ $prevobj -eq $c1 ]]; then
rm $n3
else
prevobj=$c1
fi
doneFinally, put everything back together from mytmp:
cat $(find mytmp -name "m1*" -print) > newdatabase.txt
Sorry, it can't be cleaner.

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |