Name: srikanth_onlinein Date: September 21, 2005 at 05:15:30 Pacific Subject: Word count in unix OS: Sun Solaris CPU/Ram: 2gb
Comment:
Hi Friends,
what is the fastest way to calculate wordcount of unix flat file. my flat file contained about 150 million records. Conventional way of wc -l is taking huge time togive word count. Is there any other way of doing it faster ???
In order to count words using unix tools other than wc, you'd probably have to write an awk or perl script. Without running tests, I can't be precise, but I don't think you can improve much on 'wc's speed.
If your large file happens to be a permanent file (the same file day in and day out) that grows, you might get some mileage out of establishing a base line. This approach ran 17% faster for me on a file with 7 million lines. That's not a lot, but "your results may vary".
In the example below, I am using a much smaller file. The counts below show the entire file, then a "base count" for the first 10000 lines:
wc bigfile 11284 88935 539877 bigfile
basecount=10000 head -$basecount bigfile | wc 10000 79311 481362
Now, each day I just have to process the lines beyond the base: