Computing.Net > Forums > Unix > Idendifying duplicate records

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Idendifying duplicate records

Reply to Message Icon

Name: anilcgowda
Date: September 20, 2005 at 18:01:39 Pacific
OS: UNIX
CPU/Ram: 1GB
Comment:

Hi,

I have a file with duplicate records in it. I want to identify the duplicate records and put them into a separate file. Please let me know how to do it.

Please note that each record is 2 lines in length and has spaces in each record.

Thanks,
Anil



Sponsored Link
Ads by Google

Response Number 1
Name: nails
Date: September 21, 2005 at 08:57:12 Pacific
Reply:

I see no clear method of doing this. My idea is to place each 2-line record into a file, compare the files and delete the duplicates. Here is a 3 step kludge:

1) Create a mytmp directory in the directory where the file resides. Then use the unix split command to break apart the file into 2-line files:

split -l 2 database.txt mytmp/m1

all the file names will start with "m1".

2) In ksh, find the unique check sum, using cksum, of each file. sort the file and any file that's unique, delete it:

#!/bin/ksh
fr=0
cksum $(find mytmp -type f -name "m1*" -print)|sort -k 1,1n |
while read c1 c2 n3
do
if [ fr -eq 0 ]; then
prevobj=$c1
fr=1
continue
fi
# save check sums and file sizes that are duplicate
if [[ $prevobj -eq $c1 ]]; then
rm $n3
else
prevobj=$c1
fi
done

Finally, put everything back together from mytmp:

cat $(find mytmp -name "m1*" -print) > newdatabase.txt

Sorry, it can't be cleaner.


0
Reply to Message Icon

Related Posts

See More







Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Unix Forum Home


Sponsored links

Ads by Google


Results for: Idendifying duplicate records

identifying duplicate records www.computing.net/answers/unix/identifying-duplicate-records-/7318.html

Identify duplicate records in UNIX www.computing.net/answers/unix/identify-duplicate-records-in-unix/5892.html

unix script - find duplicate record www.computing.net/answers/unix/unix-script-find-duplicate-record/5260.html