Computing.Net > Forums > Unix > removing duplicated files

removing duplicated files

Reply to Message Icon

Original Message
Name: tonytonybaker
Date: September 3, 2004 at 04:22:59 Pacific
Subject: removing duplicated files
OS: solaris
CPU/Ram: US-III 900 2Gb
Comment:

I have a solaris system where some postscript *.ps files have been created in lots of different paths and directories. Some of the Filename/Directories have been saved with ( ) in the names.
A user has then been distilling these files to *.pdf but not deleting the original *.ps file.
The disk space on my server is being used up fast because of this practice.
How can I use "find" to look at the directory paths find every instance of a *.ps file where there is a duplicate *.pdf file and remove the *.ps file??

TIA for any help.


Report Offensive Message For Removal


Response Number 1
Name: Jim Boothe
Date: September 3, 2004 at 09:52:01 Pacific
Subject: removing duplicated files
Reply: (edit)

You will need a script for this, but more info would be needed ...

Will the .ps and .pdf always be in the same directory with each other? in different and unrelated directories?

Are we searching a targeted list of directories? or a targeted list of directory trees?

Regarding parentheses in the file names: Will a .ps and a .pdf filename always be identical, including any parentheses? Or maybe the .ps filename will have parentheses but they were omitted from the .pdf filename (or vice versa)?


Report Offensive Follow Up For Removal

Response Number 2
Name: tonytonybaker
Date: September 7, 2004 at 01:14:46 Pacific
Subject: removing duplicated files
Reply: (edit)

The filenames are always identical barring the suffix (*.ps or *.pdf)

The *.ps and *.pdf files will always be in the same directory as each other.

We are searching from a starting directoy, but then many different paths under that starting point.

Hope that helps thanks for taking the trouble to reply.

I came up with using the "find" command as follows:
usr/xpg4/bin/find ./ -type f -name '*.pdf' –exec /usr/xpg4/bin/sh -c 'rm -f "${1%.pdf}.ps"' {} {} \; -print

but I haven't had time to test it yet


Report Offensive Follow Up For Removal

Response Number 3
Name: andyb1ack
Date: September 14, 2004 at 07:52:53 Pacific
Subject: removing duplicated files
Reply: (edit)

Hi Tony,

I'm always twitchy just running commands or scripts without building them up and testing them gradually, and without a final sanity check of seeing what it's going to do before I run it.

This is especially the case in a Production environment and where the rm command is used! :)

This might seem long winded but this is what I'd do:
1) create script rm_dups.ksh as below
2) run the script and check it's output,
3) once satisfied it's going to work as intended run it's output via
$ rm_dups.ksh |ksh -x
...not forgetting to sent the output to a logfile of course :)
Contents of script rm_dups.ksh:
#!/usr/bin/ksh

#-- Check that at least 1 parameter has been passed
if [[ $# -eq 0 ]]; then
echo "# ERROR: no directories passed"
exit 10
fi

#-- Loop for each directory passed
for DIR in $@
do

echo
ec---------"
date +"# %Y/%m/%d %H:%M:%S Removing duplicates from ${DIR} ..."

#-- Check that directory $DIR exists and is a directory
if [[ ! -d ${DIR} ]]; then
echo "# WARNING: directory ${DIR} does not exist"

else

#-- Loop through each pdf file in directory $DIR
for FILE in `find ${DIR} -type f -name "*pdf"`
do
echo
date +"# %Y/%m/%d %H:%M:%S Processing file ${FILE} ..."

#-- List file for reference
ls -l ${FILE} |sed -e 's/^/# /'

FILE2RM=`echo ${FILE} |sed -e 's/.pdf$/.ps/'`
if [[ -f ${FILE2RM} ]]; then
ls -l ${FILE2RM} |sed -e 's/^/# /'
echo "# File ${FILE2RM} exists"
echo "rm ${FILE2RM}"
else
echo "# File ${FILE2RM} does not exist"
fi

done # FILE

fi

done # DIR

exit 0


Report Offensive Follow Up For Removal

Response Number 4
Name: andyb1ack
Date: September 14, 2004 at 07:56:06 Pacific
Subject: removing duplicated files
Reply: (edit)

Hmmm... not sure what happened when I submitted that.

The two lines
echo
ec---------"

Were
echo "<h



Report Offensive Follow Up For Removal

Response Number 5
Name: andyb1ack
Date: September 14, 2004 at 07:57:30 Pacific
Subject: removing duplicated files
Reply: (edit)

FFS!

Just delete those two lines from the script...

Soz.


Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: removing duplicated files

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software