Solved Extract a string from an html file

July 27, 2014 at 12:09:48
Specs: Linux x86_64
I would like to extract a string from an html using a .bat file. I am working on my own tech update program and am having some issues.

Currently I have been using wget and a bat file to update all of the directly linking files I can get to, but for certain scripted websites or versioned download links I cannot get it to work well.

For now I want to be able to do this:
1: ill use wget to download this html file:

2: I need to use the bat file to search for a beginning search term and ending search term and only keep the string in between my 2 search terms. so for example on that page:

my beginning search term would be something like:
< p > If the download process does not begin automatically, please < a href = '

and my ending search term something like:
' >click here< / a >.< / p >

(I had to space out the code otherwise it would not display right on this question)

so that the resulting output of the bat script would be this link to the latest version of the file:

I was reading this page about findstr but am not sure exactly the easiest way to do it:

Thank you for any help. :)

message edited by Cigam

See More: Extract a string from an html file

Report •

July 27, 2014 at 18:14:38
✔ Best Answer
This might help, but i think html can use single and dbl quotes interchangeably so that might be a consideration:

::-- begin snippet bat
@echo off & setlocal
wget -O cigam.htm etc

for /f "tokens=2 delims='" %%a in ('find /i "if the download process does not begin"^<cigam.htm') do >url echo %%a
::------ end snippet bat
ths worked on the saved html page that you referenced. Other pages might need different approaches, and if so, it would be up to the programmer to build a more intelligent html interpreter. Some pages have limited or unpredictable crlf, cr or lf breaks, which can really mess with line-oriented text-based approaches (such as 'FIND').

message edited by nbrane

Report •

July 27, 2014 at 22:47:52
Thank you it worked :)

Report •
Related Solutions

Ask Question