count occurance of strings within given limit

September 19, 2010 at 20:10:43
Specs: Linux i686, Intel P4 (2.4 GHz) / 2 GB
I have a large text file (~900MB) containing date and time values. The text file looks like

1998/06/09 22:00:01
1998/06/09 22:00:01
1998/06/09 22:00:01
1998/06/09 22:00:01
1998/06/09 22:00:02
1998/06/09 22:00:02
1998/06/09 22:00:03
1998/06/09 22:00:03
1998/06/09 22:00:04

and so on till
1998/06/10 21:59:59
1998/06/10 22:00:00

I need to club these date and time in 2 hr blocks, calculate the entries in each block and print. Thus the output I want is

1998/06/09 00:00:00 <Number of entries in first 2 hrs>
1998/06/10 02:00:00 <Number of entries in next 2 hrs>
1998/06/10 04:00:00 <Number of entries in next 2 hrs>

and so on

Its a very big text file so solution using sed/awk would be preferred.

Any help would be highly appreciated.

Thanks !


See More: count occurance of strings within given limit

Report •

#1
September 22, 2010 at 22:33:29
Try BASIC. It can be done with a simple program in one pass.

Report •

#2
September 23, 2010 at 01:25:17
Handling the dates & times in sed or awk is going to be a nightmare.

I would use sed/awk to convert the data to MySQL inserts, then put it into a temporary table.

After, you can easily run a query to grab the data you're after, something like:

SELECT
 MIN(date), 
 COUNT(*)
FROM
 {imported_dates_table}
GROUP BY
 DATE(date), FLOOR(HOUR(date)/2)



Report •

#3
September 23, 2010 at 17:38:51
Thanks for your reply.

I have written a shell script including some 'sed' commands to do the same but it is taking hell lot of time. So was just curious to know if some 'awk' program would do this in less time for me.

Thanks !


Report •

Related Solutions

#4
September 23, 2010 at 18:19:38
Thanks for your reply !!

I hadn't considered this option. Can you help me with sed/awk to convert the data to MySQL inserts.

Thanks !!


Report •

Ask Question