Computing.Net > Forums > Unix > Script to scan files for a string

Script to scan files for a string

Reply to Message Icon

Original Message
Name: Jingling
Date: August 15, 2006 at 07:56:04 Pacific
Subject: Script to scan files for a string
OS: Unix
CPU/Ram: Solaris
Comment:

I am look for a shell script to scan all files in the system to find out if a file contains a particular string. If yes, then write the file name, path and line number containing the searching string to a text log file. Can script gurus help?

Many thanks


Report Offensive Message For Removal


Response Number 1
Name: lchi2000g
Date: August 15, 2006 at 08:35:06 Pacific
Reply: (edit)

find / -type f -exec grep -Hn "STRINGTOFIND" {} \;


/: search from / directory
-type f: search for the regular files only
-Hn: display filename with the line number

Luke Chi


Report Offensive Follow Up For Removal

Response Number 2
Name: nails
Date: August 15, 2006 at 14:16:18 Pacific
Reply: (edit)

The problem with Luke's solution is that it will also grep binary files - which can put the terminal into an unhealthy state.

# all on 1-line
find . -type f -print | xargs file | grep -i text | cut -f1 -d: | xargs egrep "$*"

This solution will search all text files in the current hierarchy. Here's what it does:

1) find all regular files

2) using xargs instead of -exec ensures standard input doesn't overflow.

3) The file command determines file types

4) grep traps for text files so binary files are eliminated.

5) cut the file name from output of the grep

6) finally, egrep the text files for expression $*


Report Offensive Follow Up For Removal

Response Number 3
Name: lchi2000g
Date: August 15, 2006 at 18:02:40 Pacific
Reply: (edit)

Unix and Linux changed a lot. It's true on one platform, but it might be false on another platform. It's true right now, but it's false in the later version.

The following were tested on the following platform minutes ago:

/home/oracle$ uname -a
Linux server_1 2.6.9-34.0.1.ELsmp #1 SMP Wed May 17 17:05:24 EDT 2006 i686 i686 i386 GNU/Linux

1. "grep" binary file didn't cause the problem. The return code was 1.

/home/oracle$ type more
more is /bin/more

/home/oracle$ file /bin/more
/bin/more: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), stripped

/home/oracle$ grep GOOD /bin/more
/home/oracle$ echo $?
1

2. "find" from / using "find" and "-exec" had no input overflow problem.

find / -type f -exec grep -Hn "STRINGTOFIND" {} \;

3. grep and egrep were combined together.

$ file /bin/egrep
/bin/egrep: symbolic link to `grep'

4. http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html

The weaknesses of xargs are (1) it can be confused by "funny"
filenames (which is why Gnu has find -print0 |xargs -0, or you can
pipe through sed to add backslashes everywhere), and (2) it can feed
zero arguments to command which might then
just sit there waiting (this is why Gnu xargs has -r which means:
don't run command if stdin is empty).

But, using -xargs indeed runs faster than using -exec

Luke Chi


Report Offensive Follow Up For Removal

Response Number 4
Name: lchi2000g
Date: August 15, 2006 at 18:03:20 Pacific
Reply: (edit)

Just want to share some tests.

Unix and Linux changed a lot. It's true on one platform, but it might be false on another platform. It's true right now, but it's false in the later version.

The following were tested on the following platform minutes ago:

/home/oracle$ uname -a
Linux server_1 2.6.9-34.0.1.ELsmp #1 SMP Wed May 17 17:05:24 EDT 2006 i686 i686 i386 GNU/Linux

1. "grep" binary file didn't cause the problem. The return code was 1.

/home/oracle$ type more
more is /bin/more

/home/oracle$ file /bin/more
/bin/more: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), stripped

/home/oracle$ grep GOOD /bin/more
/home/oracle$ echo $?
1

2. "find" from / using "find" and "-exec" had no input overflow problem.

find / -type f -exec grep -Hn "STRINGTOFIND" {} \;

3. grep and egrep were combined together.

$ file /bin/egrep
/bin/egrep: symbolic link to `grep'

4. http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html

The weaknesses of xargs are (1) it can be confused by "funny"
filenames (which is why Gnu has find -print0 |xargs -0, or you can
pipe through sed to add backslashes everywhere), and (2) it can feed
zero arguments to command which might then
just sit there waiting (this is why Gnu xargs has -r which means:
don't run command if stdin is empty).

But, using -xargs indeed runs faster than using -exec

Luke Chi


Report Offensive Follow Up For Removal

Response Number 5
Name: ghostdog
Date: August 15, 2006 at 19:37:21 Pacific
Reply: (edit)

If you ever want to run your script in both Unix and Windows, here's one in Python.

>>> import os
>>> thedir = os.path.join("c:\\","tmp")
>>> os.chdir(thedir)
>>> o = open("output.txt","a")
>>> pattern="STRING"
>>> for root,dir,files in os.walk(thedir):
>>> ... for fi in files:
>>> ......path = os.path.join(root,fi)
>>> ......all = open(path).readlines()
>>> ......for num,items in enumerate(all):
>>> ..........if pattern in items:
>>> .............print "%s found in %s , line number %d" %(pattern,path,num+1)
>>> .............o.write("%s found in %s , line number %d\n" %(pattern,path,num+1))
>>> o.close()



Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Script to scan files for a string

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 5 Days.
Discuss in The Lounge