Name: pnbalaji Date: March 19, 2008 at 18:12:56 Pacific Subject: Find 5000 oldest files from folder OS: AIX 5.3 CPU/Ram: 8GB
Comment:
Hi,
I have a directory /logo/jpg/CustArtRepository which contains around a million files in it. My requirement is to search the oldest 5000 files from this directory and its subdirectories and move them to a archive directory. I am using the following command.
find /logo/jpg/CustArtRepository -type f -name "*" -exec ls -ltr {} \; 2>/dev/null
However, the above command take a very long time since there are nearly 10000 subdirectories under this. I would like this search to end as soon as it finishes its first 5000 files.
Using the -exec option with find is very inefficient. It spawns a new shell for each found object. Haven't had time to test it, but try this:
# untested find . -type f -print | xargs ls -ltr |head -5000
Beaware that the above command breaks if any file name contains spaces. If that's an issue, let me know; I probably have another solution available that will handle spaces.
I am searching for the jpg files in the directory. I just found that the above command seems to have some issues.
Issue 1: I am getting a message "xargs: 0402-057 The ls command was not found or could not be run" when I run it from command line. I am not getting this message when I run it from the script.
Issue 2: I could see that the files were not in sorted order. I got the listing as below.
2004 files first 2005 files next 2006 files next 2007 files next 2008 files next (note: timestamp format is different for the files created in last 6 months) 2002 files next
I am pasting the example listing below.
============================================= -rw-rw-r-- 1 logoview samba 1548 Aug 29 2007 /logo/jpg/CustArtRepository/00/60/0081160_CA003_82.jpg -rw-rw-r-- 1 logoview samba 242394 Sep 12 2007 /logo/jpg/CustArtRepository/00/19/0024319_CA001_300.jpg -rw-rw-r-- 1 logoview samba 12102 Sep 12 2007 /logo/jpg/CustArtRepository/00/19/0024319_CA001_82.jpg -rw-rw-r-- 1 logoview samba 297192 Sep 12 2007 /logo/jpg/CustArtRepository/00/19/0024319_CA002_300.jpg -rw-rw-r-- 1 logoview samba 16317 Sep 12 2007 /logo/jpg/CustArtRepository/00/19/0024319_CA002_82.jpg -rw-rw-r-- 1 logoview samba 53070 Oct 05 16:35 /logo/jpg/CustArtRepository/00/89/0069889_CA001_300.jpg -rw-rw-r-- 1 logoview samba 10220 Oct 05 16:36 /logo/jpg/CustArtRepository/00/89/0069889_CA001_82.jpg -rw-rw-r-- 1 logoview samba 88195 Oct 12 09:31 /logo/jpg/CustArtRepository/00/19/0021219_CA001_300.jpg -rw-rw-r-- 1 logoview samba 9426 Oct 12 09:31 /logo/jpg/CustArtRepository/00/19/0021219_CA001_82.jpg -rw-rw-r-- 1 logoview samba 89545 Oct 12 09:31 /logo/jpg/CustArtRepository/00/19/0021219_CA002_300.jpg -rw-rw-r-- 1 logoview samba 9674 Oct 12 09:32 /logo/jpg/CustArtRepository/00/19/0021219_CA002_82.jpg -rw-rw-r-- 1 logoview samba 78449 Oct 17 09:15 /logo/jpg/CustArtRepository/01/88/0154488_CA001_300.jpg -rw-rw-r-- 1 logoview samba 7846 Oct 17 09:15 /logo/jpg/CustArtRepository/01/88/0154488_CA001_82.jpg -rw-rw-r-- 1 logoview samba 593828 Oct 19 15:07 /logo/jpg/CustArtRepository/00/62/0017562_CA001_300.jpg -rw-rw-r-- 1 logoview samba 34373 Oct 19 15:07 /logo/jpg/CustArtRepository/00/62/0017562_CA001_82.jpg -rw-rw-r-- 1 logoview samba 204328 Oct 31 12:09 /logo/jpg/CustArtRepository/00/07/0023607_CA001_300.jpg -rw-rw-r-- 1 logoview samba 11892 Oct 31 12:09 /logo/jpg/CustArtRepository/00/07/0023607_CA001_82.jpg -rw-rw-r-- 1 logoview samba 179595 Oct 31 12:09 /logo/jpg/CustArtRepository/00/07/0023607_CA002_300.jpg -rw-rw-r-- 1 logoview samba 18503 Oct 31 12:09 /logo/jpg/CustArtRepository/00/07/0023607_CA002_82.jpg -rw-rw-r-- 1 logoview samba 64677 Nov 01 14:24 /logo/jpg/CustArtRepository/00/25/0011225_CA001_300.jpg -rw-rw-r-- 1 logoview samba 3854 Nov 01 14:24 /logo/jpg/CustArtRepository/00/25/0011225_CA001_82.jpg -rw-rw-r-- 1 logoview samba 265406 Nov 05 13:00 /logo/jpg/CustArtRepository/00/07/0042807_CA001_300.jpg -rw-rw-r-- 1 logoview samba 12741 Nov 05 13:00 /logo/jpg/CustArtRepository/00/07/0042807_CA001_82.jpg -rw-rw-r-- 1 logoview samba 127399 Nov 06 13:49 /logo/jpg/CustArtRepository/00/51/0094851_CA001_300.jpg -rw-rw-r-- 1 logoview samba 10261 Nov 06 13:50 /logo/jpg/CustArtRepository/00/51/0094851_CA001_82.jpg -rw-rw-r-- 1 logoview samba 172596 Nov 19 18:07 /logo/jpg/CustArtRepository/01/54/0164154_CA002_300.jpg -rw-rw-r-- 1 logoview samba 11055 Nov 19 18:07 /logo/jpg/CustArtRepository/01/54/0164154_CA002_82.jpg -rw-rw-r-- 1 logoview samba 189231 Nov 20 16:31 /logo/jpg/CustArtRepository/00/95/0045795_CA001_300.jpg -rw-rw-r-- 1 logoview samba 15924 Nov 20 16:31 /logo/jpg/CustArtRepository/00/95/0045795_CA001_82.jpg -rw-rw-r-- 1 logoview samba 487869 Nov 20 18:17 /logo/jpg/CustArtRepository/00/42/0027542_CA001_300.jpg -rw-rw-r-- 1 logoview samba 32740 Nov 20 18:18 /logo/jpg/CustArtRepository/00/42/0027542_CA001_82.jpg -rw-rw-r-- 1 logoview samba 77053 Nov 27 12:08 /logo/jpg/CustArtRepository/01/81/0137981_CA003_300.jpg -rw-rw-r-- 1 logoview samba 9650 Nov 27 12:09 /logo/jpg/CustArtRepository/01/81/0137981_CA003_82.jpg -rw-rw-r-- 1 logoview samba 200000 Nov 30 10:30 /logo/jpg/CustArtRepository/00/73/0091973_CA001_300.jpg -rw-rw-r-- 1 logoview samba 15581 Nov 30 10:30 /logo/jpg/CustArtRepository/00/73/0091973_CA001_82.jpg -rw-rw-r-- 1 logoview samba 166244 Nov 30 10:31 /logo/jpg/CustArtRepository/00/78/0011078_CA001_300.jpg -rw-rw-r-- 1 logoview samba 11965 Nov 30 10:31 /logo/jpg/CustArtRepository/00/78/0011078_CA001_82.jpg -rw-rw-r-- 1 logoview samba 103391 Jan 08 09:34 /logo/jpg/CustArtRepository/00/48/0090748_CA004_300.jpg -rw-rw-r-- 1 logoview samba 12464 Jan 08 09:34 /logo/jpg/CustArtRepository/00/48/0090748_CA004_82.jpg -rw-rw-r-- 1 logoview samba 103391 Jan 17 14:33 /logo/jpg/CustArtRepository/00/48/0090748_CA005_300.jpg -rw-rw-r-- 1 logoview samba 12464 Jan 17 14:33 /logo/jpg/CustArtRepository/00/48/0090748_CA005_82.jpg -rw-rw-r-- 1 logoview samba 197371 Jan 17 14:34 /logo/jpg/CustArtRepository/00/48/0090748_CA006_300.jpg -rw-rw-r-- 1 logoview samba 19881 Jan 17 14:34 /logo/jpg/CustArtRepository/00/48/0090748_CA006_82.jpg -rw-rw-r-- 1 logoview samba 238869 Jan 17 14:57 /logo/jpg/CustArtRepository/00/04/0045804_CA001_300.jpg -rw-rw-r-- 1 logoview samba 34964 Jan 17 14:58 /logo/jpg/CustArtRepository/00/04/0045804_CA001_82.jpg -rw-rw-r-- 1 logoview samba 311330 Jan 29 12:18 /logo/jpg/CustArtRepository/00/40/0034040_CA001_300.jpg -rw-rw-r-- 1 logoview samba 18648 Jan 29 12:18 /logo/jpg/CustArtRepository/00/40/0034040_CA001_82.jpg -rw-rw-r-- 1 logoview samba 43488 Jan 29 14:18 /logo/jpg/CustArtRepository/00/91/0059691_CA001_300.jpg -rw-rw-r-- 1 logoview samba 3698 Jan 29 14:18 /logo/jpg/CustArtRepository/00/91/0059691_CA001_82.jpg -rw-rw-r-- 1 logoview samba 138629 Jan 30 13:49 /logo/jpg/CustArtRepository/00/18/0087518_CA001_300.jpg -rw-rw-r-- 1 logoview samba 12719 Jan 30 13:49 /logo/jpg/CustArtRepository/00/18/0087518_CA001_82.jpg -rw-rw-r-- 1 logoview samba 86455 Feb 08 08:57 /logo/jpg/CustArtRepository/01/95/0176195_CA002_300.jpg -rw-rw-r-- 1 logoview samba 4329 Feb 08 08:57 /logo/jpg/CustArtRepository/01/95/0176195_CA002_82.jpg -rw-rw-r-- 1 logoview samba 284003 Feb 28 10:12 /logo/jpg/CustArtRepository/00/94/0091094_CA001_300.jpg -rw-rw-r-- 1 logoview samba 41525 Feb 28 10:12 /logo/jpg/CustArtRepository/00/94/0091094_CA001_82.jpg -rw-rw-r-- 1 logoview samba 555157 Mar 12 07:47 /logo/jpg/CustArtRepository/00/77/0042677_CA001_300.jpg -rw-rw-r-- 1 logoview samba 31795 Mar 12 07:47 /logo/jpg/CustArtRepository/00/77/0042677_CA001_82.jpg -rw-r--r-- 1 logoview samba 11885 Nov 06 2002 /logo/jpg/CustArtRepository/02/90/0280390_CA002_82.jpg -rw-r--r-- 1 logoview samba 167142 Nov 06 2002 /logo/jpg/CustArtRepository/02/90/0281490_CA001_300.jpg -rw-r--r-- 1 logoview samba 11700 Nov 06 2002 /logo/jpg/CustArtRepository/02/90/0281490_CA001_82.jpg -rw-r--r-- 1 logoview samba 49845 Nov 06 2002 /logo/jpg/CustArtRepository/02/90/0281690_CA001_300.jpg -rw-r--r-- 1 logoview samba 2550 Nov 06 2002 /logo/jpg/CustArtRepository/02/90/0281690_CA001_82.jpg -rw-r--r-- 1 logoview samba 167066 Nov 07 2002 /logo/jpg/CustArtRepository/02/90/0281990_CA001_300.jpg -rw-r--r-- 1 logoview samba 10046 Nov 07 2002 /logo/jpg/CustArtRepository/02/90/0281990_CA001_82.jpg -rw-r--r-- 1 logoview samba 137824 Nov 07 2002 /logo/jpg/CustArtRepository/02/90/0281990_CA002_300.jpg -rw-r--r-- 1 logoview samba 7564 Nov 07 2002 /logo/jpg/CustArtRepository/02/90/0281990_CA002_82.jpg -rw-r--r-- 1 logoview samba 181768 Nov 07 2002 /logo/jpg/CustArtRepository/02/90/0282190_CA001_300.jpg -rw-r--r-- 1 logoview samba 13567 Nov 07 2002 /logo/jpg/CustArtRepository/02/90/0282190_CA001_82.jpg -rw-r--r-- 1 logoview samba 216742 Nov 09 2002 /logo/jpg/CustArtRepository/02/90/0282390_CA001_300.jpg -rw-r--r-- 1 logoview samba 15446 Nov 09 2002 /logo/jpg/CustArtRepository/02/90/0282390_CA001_82.jpg -rw-r--r-- 1 logoview samba 171591 Nov 11 2002 /logo/jpg/CustArtRepository/02/90/0282390_CA002_300.jpg -rw-r--r-- 1 logoview samba 13453 Nov 11 2002 /logo/jpg/CustArtRepository/02/90/0282390_CA002_82.jpg -rw-r--r-- 1 logoview samba 15501 Nov 11 2002 /logo/jpg/CustArtRepository/02/90/0282390_CA003_300.jpg -rw-r--r-- 1 logoview samba 4237 Nov 11 2002 /logo/jpg/CustArtRepository/02/90/0282390_CA003_82.jpg
============================================= Remember what xargs does for a living. If there are too many filenames to fit on one command line, xargs will construct as many command lines as needed to process them all. The set of filenames output by each commandline will be sorted by date. But the overall list of filenames will not be sorted. But in the special case where there are so few filenames that one command line is enough, this will work. =============================================
Since my file listing is very huge, xargs doesn't seem to work.
OK, I can see that the sort part of the ls command will get messed up; xargs won't let the ls command overflow. In that case, why not eliminate the sort and send the find output to a file:
# untested find . -type f -print | xargs ls -l > myfile.txt
Once you have a file with all the filenames, sort on the 5th field:
Now, with the 5000 file names in another file, you should be able to process them. It's a kludge, but it might be faster than your original find command.
Sorting on fifth field will not work. See the below output of the ls -l command.
-rw-r--r-- 1 logoview samba 13567 Nov 07 2002 /logo/jpg/CustArtRepository/02/90/0282190_CA001_82.jpg
Field #1 : File Permissions Field #2 : 1 (always) Field #3 : File owner Filed #4 : Group Owner Field #5 : File size Field #6 : Month in MMM format Field #7 : Date in DD format Field #8 : Year in CCYY format Field #9 : File name
Now, my requirement becomes "sort the file using Field #6, 7, 8 so that the file dates are in ascending order".
How stupid of me, and I apologize. You did say that you were interested in sorting on the date time and not the file size.
Of course, the standard Unix ls command, for files created in the last 6 months, displays the time instead of the year. Check out this computing.net link:
I know ls will be displaying the timestamp instead of date for the files that are created with in the last 6 months. So, I changed my find command as shown below.
find /logo/jpg/CustArtRepository -type f -mtime +365 ...
I made this change because of the following two reasons.
1. This will avoid the timestamp issue of ls. 2. I want to archive only the old files, not the files that are created in the last 365 days.