Extracting URLs from an html file via batch

October 20, 2009 at 01:45:13
Specs: Windows XP
I have a folder full of html files. What I would like to do is go through all the html files, extract all the URLs in them and write them to a single txt file. Is there a way to do this with a batch script?

I'm really a noob when it comes to batch files, so any help would be very much appreciated.

See More: Extracting URLs from an html file via batch

Report •

October 22, 2009 at 15:14:56
something like:
set tag="<
set tag=%tag% href
set tag=%tag%="
for %a% in (*.htm) do find /i %tag% %a% >> test

oops, the html processor removed the string
from the statment because it contained html
tags, resulting in chaos.
i put var. %tag% to try and show what i mean. the tag is just the greaterthan foll. by:
a href foll. by equals sign and dbl quote.
this gets the rough output, but then you need to format the file by snipping off everything upto and incl. <a href=" then snipping off everthing after "</a. these snips might be done by a XP batch command
using the tokens/delim, or maybe XP has a context-based "cut" command similar to unix's. I have a CUT command
written in basic, but it's klunky. You might find it at unxutils.sourceforge.net (/unxutils.zip) but their ftp
mirror is down so sourceforge might be defunct.

Report •
Related Solutions

Ask Question