|
|
|
find duplicate files
|
Original Message
|
Name: uribo
Date: May 15, 2003 at 22:22:44 Pacific
Subject: find duplicate files OS: solaris CPU/Ram: -
|
Comment: hi, i want to find all possible duplicate files in a directory. they need not to have the same filename. anyone can help? thanks.
Report Offensive Message For Removal
|
|
Response Number 1
|
Name: Swapan Satpathi
Date: May 16, 2003 at 14:38:29 Pacific
|
Reply: (edit)You can try this.. #!/bin/sh x=0 ls -lrt|awk '!($5 == ""){print $5"|"$9}'>file_list for i in `cat file_list` do isize=`echo $i|awk -F'|' '{print $1}'` ifile=`echo $i|awk -F'|' '{print $2}'` for j in `cat file_list` do jsize=`echo $j|awk -F'|' '{print $1}'` jfile=`echo $j|awk -F'|' '{print $2}'` if [ $isize = $jsize ] then if [ $ifile != $jfile ] then dix=`diff $ifile $jfile|wc -l` if [ $dix -eq 0 ] then echo "file $jfile is duplicate of $ifile" x=1 fi fi fi done done if [ $x -eq 0 ] then echo " There is no duplicate file" fi exit 0
Report Offensive Follow Up For Removal
|
|
Response Number 2
|
Name: nails
Date: May 16, 2003 at 16:33:37 Pacific
|
Reply: (edit)Hi: There's always more than one way of doing something. Solaris has a cksum command which returns a 32-bit CRC of a file. You can almost guarantee that files with the same CRC in this case will be identical. Here's what I do: 1) save the checksum of everything in the directory into a file 2) determine from the file which checksums have more than one value saving to file dupchksum.txt. 3) sort dupchksum.txt eliminating the duplicates saving to udup.txt 4) for each entry in udup.txt grep the cksum.file and print field 3 the file name. This works well if you have more than two copies of the same file. Regards, Nails #!/bin/ksh fr=0 cksum *|sort -k 1,2 > cksum.file while read c1 c2 n3 do if [ fr -eq 0 ]; then ck1=$c1 fr=1 continue fi # save check sums and file sizes that are duplicate if [[ $ck1 -eq $c1 ]]; then echo $c1 >> dupcksum.txt fi ck1=$c1 done cksum.file # need less than arrow between done and cksum.txt # what if there's more than one copy sort -u -o udup.txt dupcksum.txt rm -r dupcksum.txt if [ -f udup.txt ]; then while read nck do echo "duplicate files:" grep $nck cksum.file|awk ' { print $3 }' done udup.txt # need less than arrow between done and dupcksum.txt rm udup.txt fi
Report Offensive Follow Up For Removal
|
|
Response Number 3
|
Name: hoichoi
Date: May 19, 2003 at 18:35:42 Pacific
|
Reply: (edit)HI Nails you code looks fine but is the possible for you to do it in bourne shell cus i'm having difficulties running this script
Report Offensive Follow Up For Removal
|
|
Response Number 4
|
Name: nails
Date: May 21, 2003 at 17:42:23 Pacific
|
Reply: (edit)Hi: Only two little changes: 1) Forgot a $ on the fr -eq 0 if statement. ksh doesn't care. 2) removed the double [[ Regards, Nails #!/bin/sh fr=0 cksum *|sort -k 1,2 > cksum.file while read c1 c2 n3 do if [ $fr -eq 0 ]; then ck1=$c1 fr=1 continue fi # save check sums and file sizes that are duplicate if [ $ck1 -eq $c1 ]; then echo $c1 >> dupcksum.txt fi ck1=$c1 done cksum.file # need less than arrow between done and cksum.txt # what if there's more than one copy sort -u -o udup.txt dupcksum.txt rm -r dupcksum.txt if [ -f udup.txt ]; then while read nck do echo "duplicate files:" grep $nck cksum.file|awk ' { print $3 }' done udup.txt # need less than arrow between done and dupcksum.txt rm udup.txt fi
Report Offensive Follow Up For Removal
|
|
Response Number 5
|
Name: hoichoi
Date: May 23, 2003 at 21:46:17 Pacific
|
Reply: (edit)Hi Nails Thanks for you help. But I just want to do another option like it will count which file have how many duplicates instead of showing the duplicatates file. Is that possible to do ? It means if file a.txt has 3 duplicataes file it will show a.txt has 3 duplicates file instead of showing the file names. Regards HoiChoi
Report Offensive Follow Up For Removal
|
|
Response Number 6
|
Name: nails
Date: May 24, 2003 at 12:41:42 Pacific
|
Reply: (edit)HoiChoi: I'm not certain of your question. Pull out the -u switch on the sort command above, and you'll have all the duplicates all in a row. Which file is the one where you say, this "file" has 3 duplicates? Regards, Nails
Report Offensive Follow Up For Removal
|
|
Response Number 7
|
Name: Aki
Date: May 27, 2003 at 21:44:19 Pacific
|
Reply: (edit)Hi Nails, How can we modify the above program if we want to go through each sub direcotry in that tree and check whether there exist any duplicates down the tree and then show all the possible duplicate copies. Also, what is the usage: of this program. Also, If you have time, can you please go through the code in more detail( I am just a beginner). Also what does the line, "done >cksum.file" do. I understand it directs output to cksum.file but how. You have removed the temporary files. Also if more than one instance of the same program is running at the same time, how can we make sure that they do not write each other's temporary files. I hope they are not too many questions. If so please, excuse me for that. Thanks. Aki.
Report Offensive Follow Up For Removal
|
|
Response Number 8
|
Name: nails
Date: May 28, 2003 at 13:31:22 Pacific
|
Reply: (edit)Hi: I might be able to look at this tomorrow, but I'll answer some questions now: >Also what does the line, "done >cksum.file" >do. I understand it directs output to >cksum.file but how. You have removed the >temporary files. In the example above, don't use the greater than sign use the less than sign for input - not output: This forum doesn't print the less than sign: done, less than sign, cksum.file >Also if more than one instance of the same >program is running at the same time, how >can we make sure that they do not write >each other's temporary files One of the things you can do is use the process id identifier, $$, to create temp files: fn=chksum.file.$$ now fn will have a 5 digit extension of whatever the process id is of the script executing. Regards, Nails
Report Offensive Follow Up For Removal
|
|
Response Number 9
|
Name: nails
Date: May 29, 2003 at 09:56:44 Pacific
|
Reply: (edit)# execute cksum on all files, sort numeric on the first field. # since all duplicates are in order check each one in turn. #!/bin/ksh fr=0 cksum *|sort -k 1,1n | while read c1 c2 n3 do if [ fr -eq 0 ]; then prevobj=$c1 prevfile=$n3 fr=1 continue fi # write out file sizes that are duplicate if [[ $prevobj -eq $c1 ]]; then echo "file $prevfile has duplicate $n3" else prevobj=$c1 prevfile=$n3 fi done
Report Offensive Follow Up For Removal
|
|
Response Number 10
|
Name: Aki
Date: May 29, 2003 at 19:18:19 Pacific
|
Reply: (edit)Hi Everyone, Nails, I have tried your script with (set -x) option. It comapres each file with the next file and after that does not make any comaprisons with the rest of the files. I mean it will comapre file1 with file2 and file2 with file3 and so on. But it will not compare file1 with the rest of files except file2. So it will output sets of two identical files. How can I modify it to output all the identical files together. How can the script be made to search the whole search tree of the given directory. Thanks. Aki.
Report Offensive Follow Up For Removal
|
|
Response Number 11
|
Name: nails
Date: May 30, 2003 at 09:46:26 Pacific
|
Reply: (edit)Aki: To answer your first question: It's not accurate to say it "compares each file with the next file". It compares a file, call it filea, with the next file until the check sum changes. filea doesn't change until the checksum changes; that way you pick up multiple duplicates. This works because the check sums are numerically sorted so there's no reason to check each file with every other one. To answer your second question: To compare all the files in the directory structure, Perform a find and checksum all the files: # execute cksum on all files, sort numeric on the first field. # since all duplicates are in order check each one in turn. Find # all the duplicates in the tree #!/bin/ksh fr=0 cksum $(find . -type f -print)|sort -k 1,1n | while read c1 c2 n3 do if [ fr -eq 0 ]; then prevobj=$c1 prevfile=$n3 fr=1 continue fi # save check sums and file sizes that are duplicate if [[ $prevobj -eq $c1 ]]; then echo "file $prevfile has duplicate $n3" else prevobj=$c1 prevfile=$n3 fi done exit 0 Regards, Nails
Report Offensive Follow Up For Removal
|

|

|
Use following form to reply to current message:
|
|

|