extract lines from multiple files and merge

January 27, 2011 at 15:28:46
Specs: Windows XP
I need to extract 2 lines from each of 1824 text files and merge them into 1 file with the original filename (without the extension) appended to each line so I know where it came from. These text files are ~1million records and I need to pull records 555029 and 555030. Yes, it's the same record in every file...

Any help would be appreciated...

J


See More: extract lines from multiple files and merge

Report •


#1
January 28, 2011 at 00:04:58
not well tested

:: ==========================================
::
:: silicic.bat Fri 28-01-2011 14:46:20.85
@echo off > newfile & setLocal enableDELAYedeXpansion

for /f "tokens=* delims= " %%t in ('dir/b *.txt') do (
set N=
for /f "tokens=* delims= " %%a in (%%t) do (
set /a N+=1
if !N! equ 555029 >> newfile echo.%%~Nt %%a
if !N! equ 555030 >> newfile echo.%%~Nt %%a
)
)


=====================================
Life is too important to be taken seriously.

M2


Report •

#2
January 28, 2011 at 07:52:29
Thanx for the response... I'm still not having any luck. But I did create individual batch files using one of your previous responses to pull the records and put them in a new file. Since I have so many files to process, I'd like to nest this batch file in another batch file but so far I can only get it to run 1 line --

my extraction file is in_out.bat

@echo off > %2 & setLocal enableDELAYedeXpansion
for /f "skip=555029 tokens=* delims= " %%a in (%1) do (
>> %2 echo.%%a
)

my multiple run file (run_it_all.bat) has

in_out infile outfile
...
in_out infile outfile

1800 times with the correct filenames but it stops after the first one.

Is there something I need to change in my extraction file to allow it to run multiple times?

Thanx...
J


Report •

#3
January 28, 2011 at 07:54:35
And yes, I'm totally aware that this is an inelegant solution but it's MUCH easier than opening every one of the 32M files in excel and pulling out the individual records (which are only 4 parameters)! :)
J

Report •

Related Solutions

#4
January 28, 2011 at 08:08:24
Got it - added "call" to the lines

call in_out input output

to every record and it's running now....

Thanx again - your posts have enabled me to make it work :)


Report •

#5
January 28, 2011 at 08:28:34
one last question - once I extracted my two lines, is there an easy way to add the filename to the end of the text field?

i.e.
filenames are 19921014.hot1 and 19921014.hot2

type 19921014.hot1 >> 19921014.hot
type 19921014.hot2 >> 19921014.hot

I can easily get it into the new file as another row but I'd prefer to have it appended to the same row as the data....

Thanx,
J


Report •

#6
January 29, 2011 at 01:15:23
I'd like to know what doesn't work with my first script.


=====================================
Life is too important to be taken seriously.

M2


Report •

#7
January 29, 2011 at 14:23:14
Sorry M2, I haven't used batch scripts in 15 years and what little I knew has gone away. I tried your script but it didn't work - I can check again next week and try to figure out what was wrong, I can't access the data from home....

J


Report •

#8
January 29, 2011 at 22:06:29
But you can create a few test files.


=====================================
Life is too important to be taken seriously.

M2


Report •

#9
January 31, 2011 at 08:46:05
I could have, had I been on a computer and not my phone!

In any case, I set the echo to on so I could watch it run - it looks like it outputs the data string correctly but it's been running for 30 minutes now and there is still nothing in the output file. So I guess it might work but if it takes more than 30 minutes to run each file it would take days to run 1824 files.

As I said, I used another script you wrote to extract the individual lines then merged them.

And at 35 minutes, the output file is now 1kb but the script is still reading the same file.

Thank you again for responding, I appreciate your help....

J


Report •

#10
January 31, 2011 at 14:23:35
I tried it on 2 files, 1 about 1M the other 4M lines. It took about 24 minutes.

Maybe VBS is faster. Try a new thread like: VBS extract lines from many large files.

good luck


=====================================
Life is too important to be taken seriously.

M2


Report •

Ask Question