Solved Batch File to Write Text Between Two Words to New .txt File

August 29, 2020 at 22:47:45
Specs: Windows 8.1
I am trying to write a batch file in Windows 8 that will read a specific .txt file, extract all the text between two specific words and output the results into a new .txt file.

I have used an answer from a previous question here on Computing.net but it is not working.
Mechanix2Go left this answer and I have attempted to modify it for my specific needs.

Here's what I have so far:

:: ===== script starts here ===============
::
:: che.bat Sun 03-06-2012 13:13:54.10
@echo off > newfile & setLocal enableDELAYedeXpansioN

set C=
set A=

for /f "tokens=1* delims=[]" %%a in ('find /v /n "\\\\\\" ^< sub_list.txt') do (
echo.%%b|find "Subscribe to" > nul && if not defined C set C=%%a
echo.%%b|find "unsubscribeAccessibility" > nul && if not defined A set A=%%a
)

for /f "tokens=1* delims=[]" %%a in ('find /v /n "\\\\\\" ^< sub_list.txt') do (
if %%a equ !C! (
set S=%%b
set S=!S:Subscribe to=!
>> newfile echo.!S!
)
if %%a gtr !C! if %%a lss !A! >> newfile echo.%%b
)
goto :eof

::====== script ends here =================


See More: Batch File to Write Text Between Two Words to New .txt File


✔ Best Answer
September 7, 2020 at 08:27:10
If the lines are too long or FOUR comes before THREE or other weirdness, it will fail.
And you'll need somebody to write a vbs.

:: ==================================
:: 34A.BAT extract text between THREE and FOUR
@echo off > OUT.TXT & setLocal enableDELAYedeXpansioN

:main
for /f "tokens=* delims=. " %%a in ( 'find /i "three" ^< IN.txt ^| find /i "four" ') do (
set LINE=%%a
set LINE=!LINE:.= !
call :sub1 !LINE!
) >> OUT.txt
goto :eof

:sub1
set GO=
set S=
for %%i in (%*) do (
if /i %%i equ four set GO=
if defined GO set S=!S! %%i
if /i %%i equ three set GO=Y
)
echo.!S!
goto :eof
::====== script ends here =================

=====================

M2



#1
September 1, 2020 at 05:29:31
You can give a try with this batch file that uses a Regex in Powershell to extract data between two strings :

Extarct_Data_Between2Strings-PS.bat

@echo off
Color 0A & Mode 80,4
Title Extract Data Text Between Two Strings With Batch And Powershell by Hackoo 2020
Set InputFile="%~1"
If [%InputFile%] EQU [""] Goto :Help
Set "OutPutFile=%~dpn1_output.txt"
Set "StartString=Start"
Set "EndString=End"
echo(
echo(   Please wait a while ... Extracting Data form "%~nx1"
Call :Extract_Between %InputFile% "%StartString%" "%EndString%" "%OutPutFile%"
If Exist "%OutPutFile%" Start "" "%OutPutFile%" & Exit
REM -----------------------------------------------------------------------------
:Extract_Between <InputFile> <StartString> <EndString> <OutPutFile>
set "pscmd=$Data=GC '"%~1"';"
set "pscmd=%pscmd% $StartString='"%~2"';"
set "pscmd=%pscmd% $EndString='"%~3"';"
set "pscmd=%pscmd% $pattern=\"(?:$StartString)([\S\s]*)(?:$EndString)\";"
set "pscmd=%pscmd% $result=[regex]::Match($Data,$pattern).Groups[1].Value;"
set "pscmd=%pscmd% $result | Out-File '"%~4"';"
Powershell -command "&{%pscmd%}"
Exit /B
REM -----------------------------------------------------------------------------
:Help
Color 0C
echo(
echo(     You should drag and drop a file over, 
echo(     this script "%~nx0" for Extracting Data !
Timeout /T 10 /NoBreak>nul
Exit
REM ----------------------------------------------------------------------------

And here is another batch file that uses a Regex in Vbscript to extract data between two strings :

Extarct_Data_Between2Strings-VBS.bat

@echo off
Color 0A & Mode 80,4
Title Extract Data Text Between Two Strings With Batch And Vbscript by Hackoo 2020
Set "StartString=Start"
Set "EndString=End"
Set InputFile="%~1"
If [%InputFile%] EQU [""] Goto :Help
Set "OutPutFile=%~dpn1_output.txt"
echo(
echo(   Please wait a while ... Extracting Data form "%~nx1"
Call :Extract %InputFile% "%OutPutFile%" "%StartString%" "%EndString%"
If Exist "%OutPutFile%" Start "" "%OutPutFile%"
Exit
::----------------------------------------------------------
:Extract <InputFile> <OutPutFile> <StartString> <EndString>
>"%tmp%\%~n0.vbs" (
	echo Data = WScript.StdIn.ReadAll
	echo Data = Extract(Data,"(?:%~3)([\S\s]*)(?:%~4)"^)
	echo WScript.StdOut.WriteLine Data
	echo Function Extract(Data,Pattern^)
	echo    Dim oRE,oMatches,Match,Line
	echo    set oRE = New RegExp
	echo    oRE.IgnoreCase = True
	echo    oRE.Global = True
	echo    oRE.Pattern = Pattern
	echo    set oMatches = oRE.Execute(Data^)
	echo    If not isEmpty(oMatches^) then
	echo        Extract = oMatches(0^).SubMatches(0^)
	echo    End if
	echo End Function
)
cscript /nologo "%tmp%\%~n0.vbs" < "%~1" > "%~2"
If Exist "%tmp%\%~n0.vbs" Del "%tmp%\%~n0.vbs"
exit /b
::----------------------------------------------------------
:Help
Color 0C
echo(
echo(     You should drag and drop a file over, 
echo(     this script "%~nx0" for Extracting Data !
Timeout /T 10 /NoBreak>nul
Exit
REM ---------------------------------------------------------

message edited by Hackoo


Reply ↓  Report •

#2
September 5, 2020 at 02:38:27
It might be a REAL GOOD IDEA to say what you're trying to do.

=====================

M2


Reply ↓  Report •

#3
September 5, 2020 at 13:58:36
Hello Mechanix2Go, glad you;re here. The solution you supplied for another's questions helps me partialy.
I was hoping you could show me how to send the extracted data to a txt file.

I thought I did say what I was trying to do.
I wrote:
I am trying to write a batch file in Windows 8 that will read a specific .txt file, extract all the text between two specific words and output the results into a new .txt file.

Do you mean you want me to explain the PURPOSE for needing a batch file in Windows 8 that will read a specific .txt file, extract all the text between two specific words and output the results into a new .txt file?

More than willing to do that if it helps but not quite sure what you want.


Reply ↓  Report •

Related Solutions

#4
September 5, 2020 at 20:47:16
I gave up on PURPOSE long ago.

Just a few lines to clarify.

If you have IN.TXT:
ONE cat TWO
THREE dog FOUR
FIVE Bird SIX

Do you want whatever is between THREE and FOUR?
Is it case-sensitive?
What does OUT.TXT look like?
cat
cat
CAT
cat

=====================

M2


Reply ↓  Report •

#5
September 5, 2020 at 20:51:22
OBW for future reference, it might be useful to put a link to the original script.

OBW2 Just think, if I'd done something productive for the last 30 years, I might be rich.

=====================

M2


Reply ↓  Report •

#6
September 6, 2020 at 07:14:21
I don't know why and how my reply is hidden ???

Reply ↓  Report •

#7
September 6, 2020 at 17:19:55
Lol.
Yes. I want every word/s between THREE and FOUR to be written to OUT.TXT
THREE and FOUR are never changing keywords.
But IN.TXT has many locations within it that look like this: THREE word/s FOUR.
word/s is always different.

So if IN.TXT looks like this (formatted here for easier reading):
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna THREE cat FOUR, quis nostrud exercitation
THREE Dog Treats FOUR ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
THREE GIRAFFE FOUR irure dolor in reprehenderit in voluptate velit esse cillum
THREE ElEpHant FOUR dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Three fox tail FOUR.

OUT.TXT would look like this:
cat
dog treats
giraffe
elephant
fox tail

OUT.TXT does not need to be case-sensitive to IN.TXT
If the channel is, Fred's Auto Repair, I don't care if the output is, fred's auto repair.

The "word/s" I am after are the names of channels on YouTube in a subscription feed.
It's the name of the channel I am after. I want an easy to read, list of channel names the user is subscribed to.
A new line for every channel name.

When you use YouTube's Export Subscriptions button, if outputs a .txt file that is a horrifying wall of text.
HTML code everywhere. But there is a pattern. The name of the Channel ALWAYS follows the words, "Subscribe to" and is preceded by the word "unsubscribeAccessibility".

In the future if YT changes its code, I can simply change the two keywords to the updated format and get the channel names once again.

Here is the 8-year-old link to your answer to Cheffy that got me started:
https://www.computing.net/answers/p...

I have very little practice with writing batch code so if you can be as detailed as possible it will help.

Thank you for your time.


Reply ↓  Report •

#8
September 7, 2020 at 08:27:10
✔ Best Answer
If the lines are too long or FOUR comes before THREE or other weirdness, it will fail.
And you'll need somebody to write a vbs.

:: ==================================
:: 34A.BAT extract text between THREE and FOUR
@echo off > OUT.TXT & setLocal enableDELAYedeXpansioN

:main
for /f "tokens=* delims=. " %%a in ( 'find /i "three" ^< IN.txt ^| find /i "four" ') do (
set LINE=%%a
set LINE=!LINE:.= !
call :sub1 !LINE!
) >> OUT.txt
goto :eof

:sub1
set GO=
set S=
for %%i in (%*) do (
if /i %%i equ four set GO=
if defined GO set S=!S! %%i
if /i %%i equ three set GO=Y
)
echo.!S!
goto :eof
::====== script ends here =================

=====================

M2


Reply ↓  Report •

#9
September 7, 2020 at 17:33:35
It sort of works.
As you said, I guess there are just too many lines packed together too tightly.

When I copy and paste something like this, at the top of IN.txt,
Subscribe to Fred's Auto Repair unsubscribeAccessibility
your batch creates OUT.txt that looks like,
Fred's Auto Repair

Which means it is finding the keywords and grabbing the text in between.
Unfortunately it won't tackle the remainder of the wall of text.

I will look into a vbs solution.
Thanks again.


Reply ↓  Report •

#10
September 7, 2020 at 19:12:13
When I paste in:
Subscribe to Fred's Auto Repair unsubscribeAccessibility
I get the expected output.
You're not using some kimchi editor like MsWord, ARE YOU?!

=====================

M2


Reply ↓  Report •

#11
September 7, 2020 at 19:47:43
Post 7 lines that don't work

=====================

M2


Reply ↓  Report •

#12
September 7, 2020 at 21:43:30
Better yet, post ALL, or at least a large sample, of the html that includes the targets. Sometimes there's stuff in the html (tags) that makes for a better trigger than strings. Meantime, maybe try this vbscript (save as 'test.vbs' just to test):
'==== begin vbscript
set fso=createobject("scripting.filesystemobject")
fil="in.txt"
t1="subscribe to"
t2="unsubscribeaccessibility"
x=fso.opentextfile(fil).readall
'this just removes any line-breaks that might disrupt the pattern recognition
x=replace(x,vbcrlf,"")
'wscript.echo a(0)
z=split(x,t1,-1.0)
for i=1 to ubound(z)
a=split(z(i),t2,-1,0)
wscript.echo a(0)
next
'---- end vbscript
Did not address case-sensitive, and it can probly be optimized to use less memory, if it happens to come close to working.
@M2: you're the greatest! Suffer ye not this existential angst. I'm a sad case myself.

message edited by nbrane


Reply ↓  Report •

#13
September 7, 2020 at 22:24:32
This is from what you are calling IN.TXT

Subscribe to Fred's Auto Repair unsubscribeAccessibility
<!doctype html><html style="font-size: 10px;font-family: Roboto, Arial, sans-serif;" lang="en-US" dir="ltr" gl="US"><head><meta http-equiv="origin-trial" data-feature="Web
...
{"accessibilityData":{"label":
Subscribe to Fox News }}
"unsubscribeAccessibility":{"accessibilityData":{"label":"Unsubscribe from Fox News."}},"notificationPreferenceButton":{"subscriptionNotificationToggleButtonRenderer":

{"states":[{"stateId":2,"nextStateId":2,"state":{"buttonRenderer":{"style":"STYLE_TEXT","size":"SIZE_DEFAULT","isDisabled":false,"icon":

{"iconType":"NOTIFICATIONS_ACTIVE"},"accessibility":{"label":"Current setting is all notifications. Tap to change your notification setting for Fox

News"},"trackingParams":"CE8Q8FsiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","accessibilityData":{"accessibilityData":{"label":"Current setting is all notifications. Tap to change

your notification setting for Fox News"}}}}},{"stateId":3,"nextStateId":3,"state":{"buttonRenderer":{"style":"STYLE_TEXT","size":"SIZE_DEFAULT","isDisabled":false,"icon":

{"iconType":"NOTIFICATIONS_NONE"},"accessibility":{"label":"Current setting is personalized notifications. Tap to change your notification setting for Fox

News"},"trackingParams":"CE4Q8FsiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","accessibilityData":{"accessibilityData":{"label":"Current setting is personalized notifications. Tap

to change your notification setting for Fox News"}}}}},{"stateId":0,"nextStateId":0,"state":{"buttonRenderer":

{"style":"STYLE_TEXT","size":"SIZE_DEFAULT","isDisabled":false,"icon":{"iconType":"NOTIFICATIONS_OFF"},"accessibility":{"label":"Current setting is receive no

notifications. Tap to change your notification setting for Fox News"},"trackingParams":"CE0Q8FsiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","accessibilityData":

{"accessibilityData":{"label":"Current setting is receive no notifications. Tap to change your notification setting for Fox

News"}}}}}],"currentStateId":3,"trackingParams":"CEkQl_kBIhMIj4_Uz4bC6wIVEtLECh3ASAUb","command":{"commandExecutorCommand":{"commands":

[{"openPopupAction":{"popup":{"menuPopupRenderer":{"items":[{"menuServiceItemRenderer":{"text":{"runs":[{"text":"All"}]},"icon":

{"iconType":"NOTIFICATIONS_ACTIVE"},"serviceEndpoint":

{"clickTrackingParams":"CEwQ67UEGAAiEwiPj9TPhsLrAhUS0sQKHcBIBRsyHFBSRUZFUkVOQ0VfQUxMX05PVElGSUNBVElPTlM=","commandMetadata":

{"webCommandMetadata":

{"url":"/service_ajax","sendPost":true,"apiUrl":"/youtubei/v1/notification/modify_channel_preference"}},"modifyChannelNotificationPreferenceEndpoint":

{"params":"ChhVQ1hJSmdxbklJMlpPSU5TV05PR0ZUaEESAggCGAAgBA%3D

%3D"}},"trackingParams":"CEwQ67UEGAAiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","isSelected":false}},{"menuServiceItemRenderer":{"text":{"runs":

[{"text":"Personalized"}]},"icon":{"iconType":"NOTIFICATIONS_NONE"},"serviceEndpoint":

{"clickTrackingParams":"CEsQ7LUEGAEiEwiPj9TPhsLrAhUS0sQKHcBIBRsyElBSRUZFUkVOQ0VfREVGQVVMVA==","commandMetadata":{"webCommandMetadata":

{"url":"/service_ajax","sendPost":true,"apiUrl":"/youtubei/v1/notification/modify_channel_preference"}},"modifyChannelNotificationPreferenceEndpoint":

{"params":"ChhVQ1hJSmdxbklJMlpPSU5TV05PR0ZUaEESAggBGAAgBA%3D

%3D"}},"trackingParams":"CEsQ7LUEGAEiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","isSelected":true}},{"menuServiceItemRenderer":{"text":{"runs":[{"text":"None"}]},"icon":

{"iconType":"NOTIFICATIONS_OFF"},"serviceEndpoint":

{"clickTrackingParams":"CEoQ7bUEGAIiEwiPj9TPhsLrAhUS0sQKHcBIBRsyG1BSRUZFUkVOQ0VfTk9fTk9USUZJQ0FUSU9OUw==","commandMetadata":

{"webCommandMetadata":

{"url":"/service_ajax","sendPost":true,"apiUrl":"/youtubei/v1/notification/modify_channel_preference"}},"modifyChannelNotificationPreferenceEndpoint":

{"params":"ChhVQ1hJSmdxbklJMlpPSU5TV05PR0ZUaEESAggDGAAgBA%3D

%3D"}},"trackingParams":"CEoQ7bUEGAIiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","isSelected":false}}]}},"popupType":"DROPDOWN"}}]}}}},"subscribedEntityKey":"EhhVQ1hJS

mdxbklJMlpPSU5TV05PR0ZUaEEgMygB","onSubscribeEndpoints":[{"clickTrackingParams":"CEUQmysiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","commandMetadata":

{"webCommandMetadata":{"url":"/service_ajax","sendPost":true,"apiUrl":"/youtubei/v1/subscription/subscribe"}},"subscribeEndpoint":{"channelIds":

["UCXIJgqnII2ZOINSWNOGFThA"],"params":"GAA%3D"}}],"onUnsubscribeEndpoints":

[{"clickTrackingParams":"CEUQmysiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","commandMetadata":{"webCommandMetadata":

{"url":"/service_ajax","sendPost":true}},"signalServiceEndpoint":{"signal":"CLIENT_SIGNAL","actions":[{"openPopupAction":{"popup":{"confirmDialogRenderer":

{"trackingParams":"CEYQxjgiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","dialogMessages":[{"runs":[{"text":"Unsubscribe from "},{"text":"Fox News"},{"text":"?"}]}],"confirmButton":

{"buttonRenderer":{"style":"STYLE_BLUE_TEXT","size":"SIZE_DEFAULT","text":{"runs":[{"text":"Unsubscribe"}]},"serviceEndpoint":

{"clickTrackingParams":"CEgQ8FsiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","commandMetadata":{"webCommandMetadata":

{"url":"/service_ajax","sendPost":true,"apiUrl":"/youtubei/v1/subscription/unsubscribe"}},"unsubscribeEndpoint":{"channelIds":

["UCXIJgqnII2ZOINSWNOGFThA"],"params":"GAA%3D"}},"accessibility":

{"label":"Unsubscribe"},"trackingParams":"CEgQ8FsiEwiPj9TPhsLrAhUS0sQKHcBIBRs="}},"cancelButton":{"buttonRenderer":

{"style":"STYLE_TEXT","size":"SIZE_DEFAULT","text":{"runs":[{"text":"Cancel"}]},"accessibility":

{"label":"Cancel"},"trackingParams":"CEcQ8FsiEwiPj9TPhsLrAhUS0sQKHcBIBRs="}},"primaryIsCancel":false}},"popupType":"DIALOG"}}]}}]}},"trackingParams":"CEQQ2jAYA

CITCI-P1M-GwusCFRLSxAodwEgFGw=="}},{"channelRenderer":{"channelId":"UCyfg9nFSHL82SWVWN3YUe2w","title":{"simpleText":"Full Send Alaskan

Overland"},"navigationEndpoint":{"clickTrackingParams":"CDgQ2jAYASITCI-P1M-GwusCFRLSxAodwEgFGw==","commandMetadata":{"webCommandMetadata":

{"url":"/channel/UCyfg9nFSHL82SWVWN3YUe2w","webPageType":"WEB_PAGE_TYPE_CHANNEL","rootVe":3611}},"browseEndpoint":

{"browseId":"UCyfg9nFSHL82SWVWN3YUe2w"}},"thumbnail":{"thumbnails":[{"url":"https://yt3.ggpht.com/a/AATXAJz5ng3WEPmYBQbuggmmCb6VGc6s_Y32GdUTig4A=s48

-c-k-c0xffffffff-no-rj-mo","width":48,"height":48},{"url":"https://yt3.ggpht.com/a/AATXAJz5ng3WEPmYBQbuggmmCb6VGc6s_Y32GdUTig4A=s88-c-k-c0xffffffff-no-rj-

mo","width":88,"height":88},{"url":"https://yt3.ggpht.com/a/AATXAJz5ng3WEPmYBQbuggmmCb6VGc6s_Y32GdUTig4A=s176-c-k-c0xffffffff-no-rj-

mo","width":176,"height":176}]},"descriptionSnippet":{"simpleText":"Life and adventure on the road... from pounding the pavement to pushing the limit! My passions are

adventuring, sharing and inspiring!\n\nIf anyone wants to help fund more adventures on the road (which"},"videoCountText":{"runs":[{"text":"74

videos"}]},"subscriptionButton":{"type":"FREE","subscribed":true},"subscriberCountText":{"runs":[{"text":"31.4K subscribers"}]},"subscribeButton":

{"subscribeButtonRenderer":{"buttonText":{"runs":[{"text":"Subscribed"}]},"subscriberCountText":

{"simpleText":"31.4K"},"subscribed":true,"enabled":true,"type":"FREE","channelId":"UCyfg9nFSHL82SWVWN3YUe2w","showPreferences":false,"subscriberCountWithUnsub
scribeText":{"simpleText":"31.4K"},"subscribedButtonText":{"runs":[{"text":"Subscribed"}]},"unsubscribedButtonText":{"runs":
[{"text":"Subscribe"}]},"trackingParams":"CDkQmysiEwiPj9TPhsLrAhUS0sQKHcBIBRs=","unsubscribeButtonText":{"runs":
[{"text":"Unsubscribe"}]},"longSubscriberCountText":{"runs":[{"text":"31.4K subscribers"}]},"shortSubscriberCountText":{"simpleText":"31.4K"},"subscribeAccessibility":
{"accessibilityData":{"label":"Subscribe to Full Send Alaskan Overland."}},"unsubscribeAccessibility":{"accessibilityData":{"label":"Unsubscribe from Full Send Alaskan


Reply ↓  Report •

#14
September 7, 2020 at 22:29:58
Kimchi?
For IN.TXT I am viewing page source, copying and pasting to NotePad.
For the batch file I just copy and paste your code into Notepad.
Is that what you are asking?

Like I said, it works for me too, kind of.
After the initial output; Fred's Auto Repair, nothing else gets written to OUT.TXT
And I think, that is only working because I placed it at the top of the file.


Reply ↓  Report •

#15
September 7, 2020 at 22:40:24
I appreciate all the replies and the time taken to provide answers.

Hackoo
nbrane
Mechanix2Go

I know very little about batch and less about vb scripting.
I am more familiar with Python so just going that direction.

If anyone is interested, I can post my working Python code here.

Thanks.


Reply ↓  Report •

#16
September 8, 2020 at 00:59:14
I've been learning python for 6 months.

Any tips for starting into multithreading?

first CPU
then GPU
ASIC

=====================

M2


Reply ↓  Report •

#17
September 8, 2020 at 02:24:37
If you strip the html first it seems to work OK.

:: =====================
@echo off > NEWFILE.TXT & setLocal enableDELAYedeXpansioN
copy IN.HTML myfile.txt
:main
for /f "tokens=* delims= " %%a in (MYfile.txt) do (
set S=
set S=%%a
set S=!S:^\= !
set S=!S:^/= !
set S=!S:'= !
set S=!S:"= !
set S=!S:,= !
set S=!S::= !
set S=!S:^<= !
set S=!S:^>= !
set S=!S:^(= !
set S=!S:^)= !
set S=!S:^]= !
set S=!S:^[= !
set S=!S:^}= !
set S=!S:^{= !
set S=!S:^?= !
echo.!S!
) >> NEWFILE.TXT
::move/y NEWFILE.TXT IN.txt
echo output is in IN.txt
goto :eof
:: =================================

=====================

M2


Reply ↓  Report •

#18
September 8, 2020 at 17:25:05
Couldn't help you. I don't know what multithreading is.

Reply ↓  Report •

Ask Question