Tom's Guide | Tom's Hardware | Tom's Games
![]() |
![]() |
![]() |
Hi Friends,
what is the fastest way to calculate wordcount of unix flat file.
my flat file contained about 150 million records.
Conventional way of wc -l is taking huge time togive word count. Is there any other way of doing it faster ???pls reply soon , its urgent for me.
Regards,
Srikanth

In order to count words using unix tools other than wc, you'd probably have to write an awk or perl script. Without running tests, I can't be precise, but I don't think you can improve much on 'wc's speed.

If your large file happens to be a permanent file (the same file day in and day out) that grows, you might get some mileage out of establishing a base line. This approach ran 17% faster for me on a file with 7 million lines. That's not a lot, but "your results may vary".
In the example below, I am using a much smaller file. The counts below show the entire file, then a "base count" for the first 10000 lines:
wc bigfile
11284 88935 539877 bigfilebasecount=10000
head -$basecount bigfile | wc
10000 79311 481362Now, each day I just have to process the lines beyond the base:
tail +$((basecount+1)) bigfile | wc
1284 9624 58515Of course, the tail command still has to scan the file, but it appears to do that a bit faster than wc.

you say "wordcount" but also say "wc -l"
do you want words or lines?if you want words, are there a different number of words in each line in the file?

![]() |
![]() |
![]() |

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.
| Ads by Google |