Computing.Net > Forums > Unix > Word count in unix

Word count in unix

Reply to Message Icon

Original Message
Name: srikanth_onlinein
Date: September 21, 2005 at 05:15:30 Pacific
Subject: Word count in unix
OS: Sun Solaris
CPU/Ram: 2gb
Comment:

Hi Friends,

what is the fastest way to calculate wordcount of unix flat file.
my flat file contained about 150 million records.
Conventional way of wc -l is taking huge time togive word count. Is there any other way of doing it faster ???

pls reply soon , its urgent for me.

Regards,
Srikanth


Report Offensive Message For Removal


Response Number 1
Name: nails
Date: September 21, 2005 at 09:19:06 Pacific
Subject: Word count in unix
Reply: (edit)

In order to count words using unix tools other than wc, you'd probably have to write an awk or perl script. Without running tests, I can't be precise, but I don't think you can improve much on 'wc's speed.



Report Offensive Follow Up For Removal

Response Number 2
Name: Jim Boothe
Date: September 21, 2005 at 11:30:05 Pacific
Subject: Word count in unix
Reply: (edit)

If your large file happens to be a permanent file (the same file day in and day out) that grows, you might get some mileage out of establishing a base line.  This approach ran 17% faster for me on a file with 7 million lines. That's not a lot, but "your results may vary".

In the example below, I am using a much smaller file.  The counts below show the entire file, then a "base count" for the first 10000 lines:

wc bigfile
11284 88935 539877 bigfile

basecount=10000
head -$basecount bigfile | wc
10000 79311 481362

Now, each day I just have to process the lines beyond the base:

tail +$((basecount+1)) bigfile | wc
1284 9624 58515

Of course, the tail command still has to scan the file, but it appears to do that a bit faster than wc.


Report Offensive Follow Up For Removal

Response Number 3
Name: Dlonra
Date: September 23, 2005 at 07:06:13 Pacific
Subject: Word count in unix
Reply: (edit)

you say "wordcount" but also say "wc -l"
do you want words or lines?

if you want words, are there a different number of words in each line in the file?


Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: Word count in unix

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software