Batch extraction to text file

April 20, 2009 at 09:46:39
Specs: Windows XP
Hello there,

Anyone can help me to get this working please. I need to extract from a text file to a new text file a portion of code based on a start and end pattern.

Start pattern it's always on two lines as this:

<li type='disc'><nobr>
STARTPATTERN

... and I would like to extract from the first line of this text (<li type....) until a second pattern like this one:

<li type='disc'><nobr>
ENDPATTERN

...but in this case without taking nothing of these text, jut until the previous line.

Yes, I know it's a bit strange, but I've googled for more than an hour and found no app or command line tool able to do this.

I will be very grateful if anyone can help me with this.

Sweet greets,

banshie


See More: Batch extraction to text file

Report •


#1
April 20, 2009 at 14:08:08
Why does this seem so eerily familiar?


=====================================
If at first you don't succeed, you're about average.

M2


Report •

#2
April 20, 2009 at 15:39:48
I had to search "eerily" because my English doesn't go so deep...

eerili: def. Inspiring inexplicable fear, dread, or uneasiness; strange and frightening.

Well... I really though anyone could help me ... firstly I tried for myself finding a solution, tried to search for a standalone product able to do this and unexpectedly there's no such a program, commercial or not, capable of extract text between two diferent delimiters. I tried first for command line but finally wouldn't have matter if it was gui-based ... but not luck.

Then I tried to search for batch programming and almost all interesting and near what i was searching searches guided me to cumputing.net forums. Before posting I tried to search previous posts and related topics, and although there was no solution for my search I could found out a very helpful member community. I get really surprised there wear lots of question answered when everybody knows what happens in this sort of help forums ... lots of uneducated "questioners" and a few "helpers".

Sorry if I didn't follow any step, I'm not here for disturbing, I only thought I could get a little help here as I love to do when someone needs my help on something i can share my knowledge.

Sweet greets,

banshie


Report •

#3
April 20, 2009 at 17:25:25
Works on myfile; outputs to newfile.

=================================
@echo off > newfile & setLocal EnableDelayedExpansion

set /a N=0

for /f "tokens=* delims=" %%a in (myfile) do (
set /a N+=1
set curr=%%a
if "!curr!" equ "STARTPATTERN" if "!prev!" equ "<li type='disc'><nobr>" (
set /a pre=!N!
)
set prev=!curr!
)

set /a N=0

for /f "tokens=* delims=" %%a in (myfile) do (
set /a N+=1
set curr=%%a
if "!curr!" equ "ENDPATTERN" if "!prev!" equ "<li type='disc'><nobr>" (
set /a post=!N!-1
)
set prev=!curr!
)

set /a N=0

for /f "tokens=* delims=" %%a in (myfile) do (
set /a N+=1
set str=%%a
if !N! gtr !pre! if !N! lss !post! (echo !str! >> newfile)
)


=====================================
If at first you don't succeed, you're about average.

M2


Report •

Related Solutions

#4
April 21, 2009 at 15:12:18
Many thanks ... you're my hero ;)

I've made a few changes and maybe this way this great script by Mechanix2Go can help more people than only me... this is the new code:

Script Purpose: Extract a portion of text between two delimiters from a text file and output it to a new file. Valid for any text file, html included.

--------------------------------------------------------------
@echo off > newfile.txt & setLocal EnableDelayedExpansion

set /a N=0
set offsetup=0
set offsetdown=0

for /f "tokens=* delims=" %%a in (originalfile.txt) do (
set /a N+=1
set curr=%%a
if "!curr!" equ "START_PATTERN" (
set /a pre=!N!-1+!offsetup!
)
set prev=!curr!
)

set /a N=0

for /f "tokens=* delims=" %%a in (originalfile.txt) do (
set /a N+=1
set curr=%%a
if "!curr!" equ "END_PATTERN" (
set /a post=!N!+1+!offsetdown!
)
set prev=!curr!
)

set /a N=0

for /f "tokens=* delims=" %%a in (originalfile.txt) do (
set /a N+=1
set str=%%a
if !N! gtr !pre! if !N! lss !post! (echo !str! >> newfile.txt)
--------------------------------------------------------------

You can customize these fields to fix your needs.

originalfile.txt: The file with the text to be extracted.
newfile.txt: The created file with the extracted lines.
START_PATTERN: Pattern FROM which NEXT lines will be extracted.
END_PATTERN: Pattern UNTIL which lines BEFORE will be extracted.
set offsetup=0 Offset lines at START of extracted text. Zero means the line with START_PATTERN will be included, -1 means previous line and +1 means extract will start from next line. You can give it any value (logical ;).
set offsetdown=0 Offset lines at the END of extracted text. Zero means the line with END_PATTERN will be the last included, -1 means until previous line and +1 means extract will end on next line. You can give it any value (logical again;).

Well, hope this can help anyone around, credits to Mechanix2Go!

Sweet greets to everyone!


//banshie


Report •


Ask Question