select random no.s from a txt file

April 9, 2009 at 13:12:09
Specs: Windows XP
I have a txt file with 18 million CSV records
I need to select 20,000 random numbers from it.

Can anyone help me how to do this?

MS Excel allows only 1 million records

See More: select random no.s from a txt file

Report •

April 9, 2009 at 13:38:19
I guess you could script it but it might be pretty slow.

If at first you don't succeed, you're about average.


Report •

April 9, 2009 at 19:31:59
if you can download gawk for windows (see my sig), you can do it like this on the command line
C:\test>gawk "BEGIN{srand();for(i=1;i<=20000;i++){random[int(rand()*18000000)+1]}}(NR in random){print}" file.txt

Unix Win32 tools | Gawk for Windows

Report •

April 9, 2009 at 19:57:11
With a CSV file it is assumed the records are of random
length. That precludes using random access. I would convert
it to random access by finding the largest size of each field
and creating a random file by padding each field shorter than
the largest value. Using random number generator select the
records Records reads 36,020,000 writes 18,000,000 versus
est. 180,000,000,000 reads if random access file not created..
Additional 20,000 writes needed to save results.
This is done by using a program to read, write file and select
records. There is almost a 1 chance in 1000 of getting a
duplicate record. Selecting a unique record will increase the
time depending upon the selection criteria.

Report •

Related Solutions

Ask Question