Content Scanner for a few thousand files

May 19, 2009 at 00:51:27
Specs: CentOS
Just started to create my own small content scanner that searches all the visible files on my server, but now I got stuck.

What I tried is the following code:

#!/bin/bash


find /home/userid*/public_html/ -size -307200c -exec grep -H -n -i -l 'www.exampleurl1.com/favicon.ico\|www.exampleurl2.com/v/' > /home/mypath/scan_content.php {} \;

That code first finds all the files within all public_html folders that are not larger than 307200c follows with scanning the content of that files.

Now that worked fine for the first few thousand files, but now it stopped working. I thing there are to many files so that grep cant read all of them or something else. There is no error or something, the process just keeps alive but with a cpu & mem usage of 0 and that forever.

So it would be great if someone has an idea of how to write that scanner to ensure that it also works with a few hundred thousand files.

Thanks


See More: Content Scanner for a few thousand files

Report •


#1
May 19, 2009 at 06:19:45
pass your find command to xargs
find ...... | xargs grep .....

check the man page, google for "find and xargs" and see examples of how they are used.

Report •

#2
May 19, 2009 at 06:28:33
Thanks for the hint. Do you know if xargs has any limit?

Report •

#3
May 19, 2009 at 06:39:19
search the xargs man page. its written in one of the paragraphs. Alternatively, you can use while read loop after find
find ..... | while read line
do
 #do something..eg grep ?
done


Report •

Related Solutions

#4
May 19, 2009 at 13:31:45
I just got completely stuck, I tried to combine find, xargs and grep and write the results into a file, but I am just a beginner, so I would appreciate your help.

Here is one of my last tries:

------------------------
#!/bin/bash
find /home/userid*/public_html/ -size -2048k | xargs grep -H -n -i -l 'phrase1\|phrase2' > /home/filepath/public_html/path/scans/scan_result.php {} \;
-------------------------


Report •

#5
May 19, 2009 at 17:39:15
have you tried the while read loop method?
find /home/userid*/public_html/ -size -2048k | while read FILE
do
 grep 'pattern' FILE > newfile
done


Report •

#6
May 20, 2009 at 02:26:44
Just tried it like that:

find /home/userid*/public_html/ -size -2048k | while read FILE
do
grep 'myphrase' > pathtomyfile/scan_result.php
done

and like that


find /home/userid*/public_html/ -size -2048k | while read FILE
do
grep 'myphrase' FILE > pathtomyfile/scan_result.php
done

It runs for a while, but it is not writing any file names into my result file. i just added a few phrases into different files to ensure that there are some entries.


Report •

#7
May 20, 2009 at 05:12:32
FILE is a shell variable, to use it, you must interpolate, $FILE.
find ... |while read FILE
do
  grep pattern $FILE ....
done


Report •

#8
May 20, 2009 at 05:47:46
I just tried the following code:

find /home/userid*/public_html/ -size -2048k | while read FILE
do
grep 'myphrase' $FILE > pathtomyfile/scan_result.php
done

and also

find /home/userid*/public_html/ -size -2048k | while read FILE
do
grep -i -l 'myphrase' $FILE > pathtomyfile/scan_result.php
done

Both are starting for a short moment and finish, but without writing anything into result file.

Sorry for asking again.


Report •

#9
May 20, 2009 at 05:58:43
because you are using >. its overwriting each time..

Report •

#10
May 20, 2009 at 06:09:46
And how should I store it instead?

find /home/userid*/public_html/ -size -2048k | while read FILE
do
grep -i -l 'myphrase' $FILE pathtomyfile/scan_result.php
done

That wouldn't store the result, wouldn't it?


Report •

#11
May 20, 2009 at 06:13:08
use >> instead of > .. this is very basic stuff, its called appending...pls revise your shell basics again.

Report •

#12
May 20, 2009 at 07:14:46
Thanks for the help. I just tested it but it causes just problems.

The load gets so high, that even a 8 core 6gb ram machine gets stuck. Just stopped it at a serverload of nearly 19 .

So does anyone knows a better way of scanning such a large number of files? Or does anyone knows about a software or script that is already produced for such cases?


Report •


Ask Question