Solved extract by a batch file,the lines located between two tags

March 11, 2013 at 07:45:32
Specs: Windows 7
I need to pick up by a batch file, the lines located between two tags within a text file. For example:

my file.txt:

E024=ER

R024= 1.0BX/FT
TRAJET-
1 PR 5024 OP16MAR 6 LPD PLY BI1 22:30 06:30 17MAR 7 AL
SAC
BUREAUX EFFECTIFS -
1.ETAGE/A1 15MAR ETC 17MAR FRPLY
DATE LIMITE D'ECHANGE -
1.T- 29JAN 11:27 INDIVIDUEL
TK 471144962 DE
TELEPHONE -
1. NC
RECU DE-TE
C-TBH
PNO.HDQ4D 1127/29JAN13 QYOXPL H

E025=I

end file.

I want to extract all lines between the specific strings "E024" and "E025", except them. Please how can i do that? Now I only got one line. Thank you in advance


See More: extract by a batch file,the lines located between two tags

Report •


✔ Best Answer
March 15, 2013 at 14:41:37
Ok, here's what I could come up with:
'===== begin vbscript anomyze.vbs, usage: anomyze infile > outfile
set regex=new regexp
regex.ignorecase=false
regex.global=true
set fso=createobject("scripting.filesystemobject")
z=fso.opentextfile(wscript.arguments(0),1).readall
dim n(6)
'== first, the base
n(0)="\d{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)"
'now add qualifiers
n(1)=n(0)+"13"
n(2)=n(0)+"\b"
n(0)=n(0)+"13 [A-Z]{6} "
n(3)="------- ------ "
n(4)="*******"
n(5)="*****"
for i=0 to 2
regex.pattern=n(i)
z=regex.replace(z,n(i+3))
next
wscript.echo z
'====== end vbscript

You will note that the year is "hard-wired" as 13. You could make it more flexible, if need be: replace "13 [" with "1\d{1,1} ["



#1
March 11, 2013 at 18:27:11
I think I'm getting lazy/spoiled using vbscript, but here goes:
'==== begin vbscript
set fso=createobject("scripting.filesystemobject")
x=split(fso.opentextfile("file.txt",1).readall,"E024")
set y=fso.opentextfile("file.out",2,true)
for i=1 to ubound(x)
p=instr(x(i),vbcrlf)
if p>0 then x(i)=mid(x(i),p+2)
y.writeline split(x(i),"E025")(0)
next
y.close

Report •

#2
March 11, 2013 at 20:51:49
:: ===== script starts here ===============
::
:: between.bat 2013-03-12 10:03:51.96
@echo off > newfile & setLocal enableDELAYedeXpansioN

set H=
set T=

for /f "tokens=1* delims=[]" %%a in ('find /n "E024" ^< myfile') do (
set H=%%a
)

for /f "tokens=1* delims=[]" %%a in ('find /n "E025" ^< myfile') do (
set T=%%a
)

for /f "tokens=1* delims=[]" %%a in ('find /n /v "" ^< myfile') do (
if %%a gtr !H! if %%a lss !T! echo.%%b
)>> newfile

goto :eof
::====== script ends here =================

=====================
M2 Golden-Triangle


Report •

#3
March 12, 2013 at 02:40:52
Sorry to spend more time! Thanks first for your answer.
The example I gave you is a piece of what is included in the file. In fact file does not start off with "my file.txt:". There is other information before the line containing "E024" and after the line containing "E025".
Here is the file logic:
"Rxyz" always follows "Exyz"
"Rxyz" can be written at many lines while "Exyz" is everytime at one line.
"C=" represents comments and it can be come anytime.

This below shows detailed information and structure of my file.

_________Simulateur Ver 6.7.5________________________________ Page 1 ________
Plate-forme X Lu_name : ******** Term.: MOS Prot. : IP Lniata : BD4514
FICHIER : AUTO-NR-ECHANGE_X.ANS Date et heure de traitement : 29/01/2013, 11:25:52
______________________________________________________________________________

C=_____________________________________________________________________________________


C= NR_X


C= AUTO TRAIN - ECHANGE D UN BILLET AUTO TRAIN


C=_____________________________________________________________________________________


C============================= ===============================


C=*** PARTIE TECHNIQUE : ADMINISTRATEUR


E001=SO*

R001=00001713CONNECTEZ VOUS A
LA PLATEFORME

E002=SI1^TSTS^$

R002=FRPNO FRHDQ$DSX.A..SN.FR
29JAN
******************************************************
*
* BIENVENUE PLATEFORME X / NOBASE ZTPF 'ACPSXVAL'
* BONNE JOURNEE DE LA PART DE DSIV OX PT
*
******************************************************
*
* CONNEXIONS AUX GDS :
*
* TRAVELPORT - YES
* HERMES SORTANT - YES
* HERMES ENTRANT - NO

E003=ZTTCP DISP ALL

R003=CSMP0097I 11.27.31 CPU-P SS-BSS SSU-SN IS-01
TTCP0184I 11.27.31 IP CONNECTIONS DISPLAY
CURRENT DESIRED
OSA NAME STATUS STATUS LOCAL IP ADDR TRACE READ DATA
-------- ------ ------ --------------- ----- ---- ----
TST1704 ACTIVE ACTIVE 10.027.186.142 ALL 1704 1716
END OF DISPLAY

E004=QXI ...............................................................

..........................................................................................

I want to extract all lines between the strings "Exyz" and next "Exyz", except them. Thank you and sorry for being not explicitly before.



Report •

Related Solutions

#4
March 12, 2013 at 04:18:07
I want to extract all lines between the strings "Exyz" and next "Exyz", except them. Thank you and sorry for being not explicitly before.

can't find Exyz

=====================
M2 Golden-Triangle


Report •

#5
March 12, 2013 at 04:53:00
Mechanix2Go,

in the expression "Exyz" or Rxyz, x,y,z belongs to [0-9].
We can have as exemples "E001","E002",.."E024","E025",....."E100",............

Thanks very much.


Report •

#6
March 12, 2013 at 05:16:18
If you tell me which lines this might get done.

==================================


---------- SOMEFILE
[1]
[2]_________Simulateur Ver 6.7.5________________________________ Page 1 ________
[3]Plate-forme X Lu_name : ******** Term.: MOS Prot. : IP Lniata : BD4514
[4]FICHIER : AUTO-NR-ECHANGE_X.ANS Date et heure de traitement : 29/01/2013, 11:25:52
[5]______________________________________________________________________________
[6]
[7]C=_____________________________________________________________________________________
[8]
[9]
[10]C= NR_X
[11]
[12]
[13]C= AUTO TRAIN - ECHANGE D UN BILLET AUTO TRAIN
[14]
[15]
[16]C=_____________________________________________________________________________________
[17]
[18]
[19]C============================= ===============================
[20]
[21]
[22]C=*** PARTIE TECHNIQUE : ADMINISTRATEUR
[23]
[24]
[25]E001=SO*
[26]
[27]R001=00001713CONNECTEZ VOUS A
[28]LA PLATEFORME
[29]
[30]E002=SI1^TSTS^$
[31]
[32]R002=FRPNO FRHDQ$DSX.A..SN.FR
[33]29JAN
[34]******************************************************
[35]*
[36]* BIENVENUE PLATEFORME X / NOBASE ZTPF 'ACPSXVAL'
[37]* BONNE JOURNEE DE LA PART DE DSIV OX PT
[38]*
[39]******************************************************
[40]*
[41]* CONNEXIONS AUX GDS :
[42]*
[43]* TRAVELPORT - YES
[44]* HERMES SORTANT - YES
[45]* HERMES ENTRANT - NO
[46]
[47]E003=ZTTCP DISP ALL
[48]
[49]R003=CSMP0097I 11.27.31 CPU-P SS-BSS SSU-SN IS-01
[50]TTCP0184I 11.27.31 IP CONNECTIONS DISPLAY
[51]CURRENT DESIRED
[52]OSA NAME STATUS STATUS LOCAL IP ADDR TRACE READ DATA
[53]-------- ------ ------ --------------- ----- ---- ----
[54]TST1704 ACTIVE ACTIVE 10.027.186.142 ALL 1704 1716
[55]END OF DISPLAY
[56]
[57]E004=QXI ...............................................................
[58]
[59]..........................................................................................
[60]
[61]I want to extract all lines between the strings "Exyz" and next "Exyz", except them. Thank you and sorry for being not explicitly before.

=====================
M2 Golden-Triangle


Report •

#7
March 12, 2013 at 05:53:07
I want to retrieve all text block including Rxyz and save that in a file. After every Rxyz text block, we've got a line feed.

For example, retrieve the block string "R003=......" as a block delimited between "E003" and "E004". If parser meets a comment (tag with C=), it skips that. Just have in a file all text block Rxyz.
Nota: Please as you have numbered the lines, avoid considering in your script that some lines are empty like the lines [20],[21], .. because in another file which might have the same structure, it may [20],[21] are no longer empty. It will depend on length of different blocks Rxyz.

The output file should look like this:

R001=00001713CONNECTEZ VOUS A
LA PLATEFORME

R002=FRPNO FRHDQ$DSX.A..SN.FR
29JAN
******************************************************
*
* BIENVENUE PLATEFORME X / NOBASE ZTPF 'ACPSXVAL'
* BONNE JOURNEE DE LA PART DE DSIV OX PT
*
******************************************************
*
* CONNEXIONS AUX GDS :
*
* TRAVELPORT - YES
* HERMES SORTANT - YES
* HERMES ENTRANT - NO

R003=CSMP0097I 11.27.31 CPU-P SS-BSS SSU-SN IS-01
TTCP0184I 11.27.31 IP CONNECTIONS DISPLAY
CURRENT DESIRED
OSA NAME STATUS STATUS LOCAL IP ADDR TRACE READ DATA
-------- ------ ------ --------------- ----- ---- ----
TST1704 ACTIVE ACTIVE 10.027.186.142 ALL 1704 1716
END OF DISPLAY

......................................................................................................................

Thanks


Report •

#8
March 12, 2013 at 07:00:15
:: ===== script starts here ===============
::
:: between2.bat 2013-03-12 10:03:51.96
@echo off > newfile & setLocal enableDELAYedeXpansioN

set H=
set T=

for /f "tokens=1* delims=[]" %%a in ('find /n "E003" ^< somefile') do (
set H=%%a
)

for /f "tokens=1* delims=[]" %%a in ('find /n "E004" ^< somefile') do (
set T=%%a
)

for /f "tokens=1* delims=[]" %%a in ('find /n /v "" ^< somefile') do (
if %%a gtr !H! if %%a lss !T! echo.%%b
)>> newfile

goto :eof
::====== script ends here =================

=====================
M2 Golden-Triangle


Report •

#9
March 12, 2013 at 18:31:37
I tried "reversing" your logic, (instead of including content between "R" tags, just exclude the "E" and "C" tags). Is that not the same thing? Plus of course we're assuming standard crlf's between lines:
::==== script 1: remove blank lines, but possibility of messing up content (%, !, etc)
@echo off & setlocal
(for /f "skip=4 tokens=*" %%a in ('findstr /v /r /b "^E...=.* ^C=" kerval2') do echo %%a)>kerval9
::===== end script 1
::===== script 2: all blank lines will be retained in output
@echo off & setlocal
more +4 kerval2 | findstr /v /r /b "^E...=.* ^C=">kerval9
::==== end
output will be in "kerval9", as written. Adjust to suit/as needed.
And, obviously, I made gross assumptions about the "header lines" being four in number, since they're not tagged.

Report •

#10
March 13, 2013 at 05:38:19
Thank you very much nbrane!

I have just tried your script and it works fine on example.
I have seen your comment about script 2. Forgive my ignorance since I'm a beginner.
About 'more +4 kerval2', what does such a command do and what is its purpose? Indeed, as for me I was thinking 'more' is only used to show bit by bit the command prompt output and that it was no sense to use 'more' when output is a file (kerval9 in our case)?

Moreover, if I want to remove all tab or space at the end of each line of kerval2, how can I add it in your script?

With that, I think I can go ahead

Thanks once again.


Report •

#11
March 13, 2013 at 11:04:46
The "more" command was enhanced somewhere along the line to recognize when it's being run in a pipe (or redirected output), to have sense enough not to "hang" every 23 lines, so I just used it to skipt the 4 "header" lines (same as: "for /f skip=4..." in first script).
As i mentioned, the first script does remove trailing spaces (as well as blank lines). To use the second script, you will need another component in the pipeline. Since a batch solution would involve the same constraints as apply to script #1 (possibility of special characters in the stream), I would use vbscript:
'==== begin vbscript rtrim.vbs
do until wscript.stdin.atendofstream
wscript.stdin.writeline rtrim(wscript.stdin.readline)
loop
'==== end: Note this will still retain blank lines, as written.
here's one for blank lines as well:

do until wscript.stdin.atendofstream
a=rtrim(wscript.stdin.readline)
if len(a)>0 then wscript.stdout.writeline a
loop
'==== end rtrim2.vbs
and put it at the end of the pipeline:

more +4 kerval2 | findstr /v /r /b "^E...=.* ^C=" | cscript rtrim.vbs >kerval9


Report •

#12
March 13, 2013 at 13:08:30
nbrane,

. I have tried out successfully what you have said on calling rtrim.vbs in the batch file.
Thank you for your great idea and your good help.

Thanks.


Report •

#13
March 14, 2013 at 02:13:12
I progress to compare two files (f1,f2), outputs of my program, which are very close in order to highlight some significant differences into another file f3. However, there is information such as date which is not intended to compare.
Therefore, in order to anonymize all date values knowing that the date format is [0-9]{2}[A-Z]{3} or [0-9]{2}[A-Z]{3}[0-9]{2}, I use command
'findstr /r ^[0-9]{2}[A-Z]{3}[0-9]{0,2} kerval2' to find firstly all strings which could match to this regex within 'kerval2' file in order to replace each of them by a unique string "Date". the goal being to have "Date" everywhere a date value is readable within the files; unfortunately it doesn't work at all.

Please look at the example of two files i am using (Through this example, notice that date is changing in f1, f2 and I'd like to ignore the variable date when I compare them) :
Note also that a date can be something like that: "21DEC" or "29DEC13".

f1.txt

R001=SN..A.DECONNECTE
R002=FRPNO FRPNO$TSA.A..SN.FR
21DEC
******************************************************
*
* WELCOME PLATEFORME
*
*
******************************************************

R003=ITINERAIRE-
1 SN 5024 OP16MAR 6 FRLPD FRPLY SS1 22:30 06:30 16MAR 7 ED
SSVC
PLACES ATTRIBUEES -
1.AUTO/A1 15MAR FRLPD 17MAR FRPLY

R004= 1.1BX/CUITE-CV
ITINERAIRE-
1 SN 5024 OP19MAR 6 FRLPD FRPLY BI1 22:30 06:30 19MAR 7 ED
SSVC
PLACES ATTRIBUEES -
1.AUTO/A1 25MAR FRLPD 15MAR FRPLY
DATE LIMITE DE RETRAIT -
1.TL2230 /05FEB
TELEPHONE -
1. NC
RECU DE-TE
C-BD4514 M-BD4514
FRPNO.FRHDQ4DSX 1127/29JAN13 QYOXPL H

f2.txt

R001=SN..A.DECONNECTE
R002=FRPNO FRPNO$TSA.A..SN.FR
02FEB
******************************************************
*
* WELCOME PLATEFORME
*
*
******************************************************

R003=ITINERAIRE-
1 SN 5024 OP14JUN 6 FRLPD FRPLY SS1 20:30 06:30 14JUN 7 ED
SSVC
PLACES ATTRIBUEES -
1.AUTO/A1 17JUN FRLPD 17JUNFRPLY

R004= 1.1BX/CUITE-CV
ITINERAIRE-
1 SN 5024 OP20APR 6 FRLPD FRPLY BI1 09:30 16:30 22APR 7 ED
SSVC
PLACES ATTRIBUEES -
1.AUTO/A1 20APR FRLPD 22APR FRPLY
DATE LIMITE DE RETRAIT -
1.TL2230 /05APR
TELEPHONE -
1. NC
RECU DE-TE
C-BD4514 M-BD4514
FRPNO.FRHDQ4DSX 1127/09APR13 ZAGMKL H

After anonymizing date, I then compare each other Rxyz present in f1 and f2 and put every status of comparison into f3., that is something like that:
read R001 from f1 and R001 from f2 -->Write into f3 R001 is OK or KO
and so on....
Do you think this kind of process can be done properly using a batch file through a vbs script called from it?

The way to anonymize date will help me since i could anonymize other useless information like "ZAGMKL" within R004 from f2 and "QYOXPL" within R004 from f1....

Thank you gain for your feedback


Report •

#14
March 14, 2013 at 17:08:39
Well, I'm "sucky" at reg-exp, but I'll work on it. I suggest re-posting this part (since it's basically a different problem) to get higher-caliber talent. Meantime, I'll brush up on my non-existant regexp! :-)

Report •

#15
March 15, 2013 at 14:41:37
✔ Best Answer
Ok, here's what I could come up with:
'===== begin vbscript anomyze.vbs, usage: anomyze infile > outfile
set regex=new regexp
regex.ignorecase=false
regex.global=true
set fso=createobject("scripting.filesystemobject")
z=fso.opentextfile(wscript.arguments(0),1).readall
dim n(6)
'== first, the base
n(0)="\d{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)"
'now add qualifiers
n(1)=n(0)+"13"
n(2)=n(0)+"\b"
n(0)=n(0)+"13 [A-Z]{6} "
n(3)="------- ------ "
n(4)="*******"
n(5)="*****"
for i=0 to 2
regex.pattern=n(i)
z=regex.replace(z,n(i+3))
next
wscript.echo z
'====== end vbscript

You will note that the year is "hard-wired" as 13. You could make it more flexible, if need be: replace "13 [" with "1\d{1,1} ["


Report •

#16
March 16, 2013 at 10:24:56
Thanks Nbrane!
there was still a little problem of replacing the date. Indeed only dates as "-----" were right when "-------" weren't.
I have just change your regex group in here's one
n(1)="\d{1,2}(J|F|M|A|S|O|N|D)"
n(0)=n(1)+"13"
n(1)=n(1)+"\b"
n(2)="-------" 'for date with year "13"
n(3)="*****" 'for date without year

You're good Nbrane
Thank you very much
See you.



Report •

#17
March 16, 2013 at 11:41:54
Oh, it's because I thought there would always be a space after the "13" (same with the ddMMM format dates, that there's always a space following - let me know if that's not the case). If you want to anon. the 6 bytes following the space following the "13", we may need a third reg.exp. This might fix the current problem, change this line:
n(0)=n(1)+"13 [A-Z]{6} "
(i don't know where that space came from! it shouldn't have been there regardless)
to this:
n(0)=n(1)+"13"
But to get those 6 bytes, it would need more elements in array n, and one more loop:
dim n(6)
n(0)="\d{1,2}(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)"
n(1)=n(0)+"13"
n(2)=n(0)+"\b"
n(0)=n(0)+"13 [A-Z]{6} "
n(3)="------- ------ "
n(4)="*******"
n(5)="*****"
...
for i=0 to 2
regex.pattern=n(i)
z=regex.replace(z,n(i+3))
next
wscript.echo z

The order of applying the expressions is important, because the wrong order will replace targets before they should be and make them not recognized. I'll go ahead and put this change into my #15 script.


Report •

#18
March 18, 2013 at 05:30:17
Hi,
Please Is it possible through 'findstr' command to retrieve just only a specific block (all string contained) like "R003=.* up to the next tag "E004" or "C=", exclude them??

Thank you.


Report •

#19
March 20, 2013 at 02:02:31
I have a batch file toto.bat and it takes two arguments from commandline:
@echo off
set arg1=%1
set arg2=%2
more +1 %arg1%|findstr /I /r /b /C:"^E.*=%arg2%">out

To run it from command line, i do
toto.bat TEST.txt "SI1^TSTS^$$" (With arg1=TEST.txt arg2="SI1^TSTS^$").
The console output shows:
C:\Users\mna\Desktop\Nouveau dossier\tags.vbs
VBScript: L'entrée dépasse la fin du fichier
and the output file namely "out" is empty.

Please How can I fix it?

Thanks.


Report •


Ask Question