Solved Parse html file with VBScript

May 16, 2014 at 07:31:42
Specs: Microsoft Windows 7 Enterprise, 2.1 GHz / 4063 MB
Hello all,

I am writing a script to bulk copy/rename files, and need a little help. I really don't have any experience with vbscript, however I think I've done pretty well thus far getting to where I am.

I need help writing a portion of code that will loop until it finds a given file name. For example, the part of the html file looks like this:

<td width=544 valign=top style='width:407.7pt;padding:0cm 5.4pt 0cm 5.4pt'>
  <p class=OHHpara align=left style='text-align:left'><span style='font-size:
  11.0pt'><a href="#">Servicing Agreement</a></span></p>

EDIT: Not sure why, but the # in the href should be a link to a local document, in this case 08.pdf. It keeps changing to a # when I edit it.

I need help getting the script to locate the file name given, in this case "08.pdf", which I already have stored in a variable. I then need it to get the text that follows, in this case "Servicing Agreement" and store it in a new variable. In this case, the text is all on the same line, however there are ones that will be longer and span two+ lines. So I need the text between the following > and <

I think I can manage the copying/renaming myself, but I am lost as to how to correctly parse this file... Can anyone help?

I can post the code I have so far, as long as you all promise not to laugh ;).. It's a Frankenstein of examples I've found online and is very likely not optimal, but so far does what I need I think.

Thank you,

Matt

Can I put a question here?

message edited by Matt123


See More: Parse html file with VBScript

Report •

✔ Best Answer
May 18, 2014 at 12:22:07
Well, I said I'd do code, so here it is. I'm really rusty and I always go overboard when I have a GUI, so the script ends up being bigger than it needs to be. Normally you'd never bother showing IE. It'd simplify matters a bit, as you wouldn't need to use events like I end up doing.

Out of 50 lines, 2 of them are basic initialization. 5 of them are involved with getting the information out of the HTML. 15 are handling the new file name and the actual rename. The rest are about using IE as a UI.

To use: Run the script, then drag the HTML file onto the IE window that spawns. Repeat as needed.

'The setup. All work is handled by IE_NavigateComplete2.
initing = True
Set fso = CreateObject("Scripting.FileSystemObject")
Set ie = WScript.CreateObject("InternetExplorer.Application", "IE_")
ie.RegisterAsDropTarget = True : ie.AddressBar = False
ie.Navigate "about:blank"
With ie.Document
  Dim msg : Set msg = .createElement("div")
  msg.innerText = "Please drag/drop the web page here"
  .body.appendChild msg
End With
initing = False : ie.Visible = True

While True 'Script exit handled by IE_OnQuit
  WScript.Sleep 100
Wend

Sub IE_DocumentComplete(pDisp, URL)
  If initing Then Exit Sub
  ie.RegisterAsDropTarget = False
  dir = fso.GetParentFolderName(URL) & "\"
  
  For Each a In ie.Document.getElementsByTagName("a")
    RenameFile dir, a.getAttribute("href", 2), a.innerText
  Next 'a
  WScript.Echo "Done. You can close IE, or drag/drop another file."
  ie.RegisterAsDropTarget = True
End Sub

Sub IE_OnQuit()
  WScript.Quit
End Sub

Sub RenameFile(sDir, sOld, sNew)
  invalids = Array(":", "\", "/", "*", "?", "<", ">", "|", """")
  For Each c in invalids
    sNew = Replace(sNew, c, "")
  Next 'c
  sNew = Trim(sNew)

  ext = "." & fso.GetExtensionName(sOld)
  If fso.FileExists(sDir & sNew & ext) Then
    cnt = 1
    Do While fso.FileExists(sDir & sNew & " (" & cnt & ")" & ext)
      cnt = cnt + 1
    Loop
    sNew = sNew & " (" & cnt & ")"
  End If
  fso.GetFile(sDir & sOld).Name = sNew & ext
End Sub

How To Ask Questions The Smart Way

message edited by Razor2.3



#1
May 16, 2014 at 09:15:38
HTML is one of those cases where it's far easier to have someone else do the heavy lifting. If it's valid HTML, just load it up in IE and navigate the DOM. If it's valid XHTML, load it up MSXML and get the same benefits of parsing without the IE overhead.

Valid code example would require a valid example HTML.

How To Ask Questions The Smart Way

message edited by Razor2.3


Report •

#2
May 16, 2014 at 10:05:35
Hi Razor,

Glad to see your still around! Been a long time since I've posted here.

The html example I gave was changed upon submitting. I tried editing it, but it keeps removing "08.pdf" in the href and replacing it with a #.

I'm confused as to what you mean by your response; I understand that opening the file in IE would allow me to view the links as intended, but I need the pdf files to be copied and renamed according to the name given in the html file link.

These files are stored on a disc, with an index.htm document that is used to navigate the files. The files are then to be stored in a SharePoint Library, however the names they have right now are not very descriptive and someone (read me) would have to manually copy/rename the hundreds of files to the name given in the html file before uploading.

Every index file on every disc is similar in the sense that

RealName.pdf">DesiredName<

occurs the same way. I can not upload the whole document as a lot of it is confidential.

I have already managed to write a loop to pick up the names of the files on the disc, I just need help locating that string (filename) in the html file, and getting the text that follows after, between the > and <.

Apologies if I did not understand your response exactly, as I really don't know anything about html and and definitely in over my head here.

Thanks for the help!

Matt

Can I put a question here?

message edited by Matt123


Report •

#3
May 16, 2014 at 10:22:05
IE can be scripted, and it can expose the parsed file to the script controlling it. The basic idea is you get IE to load the document in question. Once it's done parsing, you look at the resulting document using the standard DOM tools to grab what you need, and do what you need to it.

Look, I'm going to have some free time tonight / this weekend and I haven't had an excuse to do much VBScript. It's all been PowerShell this, and KornShell that. Throw the HTML into pastebin or whatever, and I'll see what I can throw together.

How To Ask Questions The Smart Way

message edited by Razor2.3


Report •

Related Solutions

#4
May 16, 2014 at 10:28:11
I was unaware that could be done, I'll have to read up on it. Thank you for the help. I will edit this post with a link to the document once I have removed some of the sensitive information (I really could lose my job over sharing some of the desired names etc).

EDIT: is it alright if I pm you the link instead?

Thank you again!

Can I put a question here?

message edited by Matt123


Report •

#5
May 16, 2014 at 23:28:17
Since it's text-oriented, I submitted this feebo (FEEBleOffering):
set fso=createobject("scripting.filesystemobject")
a=fso.opentextfile("matt",1).readall
jpg="08.jpg"
p=instr(a,jpg)
p2=instr(p,a,"<")
z=mid(a,p,p2-p)

p1=instr(z,">")
file=left(z,len(jpg))
targ=mid(z,p1+1)
wscript.echo "file: "&file&" target:"&targ
wscript.quit

message edited by nbrane


Report •

#6
May 18, 2014 at 06:24:26
Thank you for the response nbrane! I will test this out on Tuesday (Monday is a holiday in Canada) when I have acess to the files again. Greatly appreciated!

Can I put a question here?


Report •

#7
May 18, 2014 at 12:22:07
✔ Best Answer
Well, I said I'd do code, so here it is. I'm really rusty and I always go overboard when I have a GUI, so the script ends up being bigger than it needs to be. Normally you'd never bother showing IE. It'd simplify matters a bit, as you wouldn't need to use events like I end up doing.

Out of 50 lines, 2 of them are basic initialization. 5 of them are involved with getting the information out of the HTML. 15 are handling the new file name and the actual rename. The rest are about using IE as a UI.

To use: Run the script, then drag the HTML file onto the IE window that spawns. Repeat as needed.

'The setup. All work is handled by IE_NavigateComplete2.
initing = True
Set fso = CreateObject("Scripting.FileSystemObject")
Set ie = WScript.CreateObject("InternetExplorer.Application", "IE_")
ie.RegisterAsDropTarget = True : ie.AddressBar = False
ie.Navigate "about:blank"
With ie.Document
  Dim msg : Set msg = .createElement("div")
  msg.innerText = "Please drag/drop the web page here"
  .body.appendChild msg
End With
initing = False : ie.Visible = True

While True 'Script exit handled by IE_OnQuit
  WScript.Sleep 100
Wend

Sub IE_DocumentComplete(pDisp, URL)
  If initing Then Exit Sub
  ie.RegisterAsDropTarget = False
  dir = fso.GetParentFolderName(URL) & "\"
  
  For Each a In ie.Document.getElementsByTagName("a")
    RenameFile dir, a.getAttribute("href", 2), a.innerText
  Next 'a
  WScript.Echo "Done. You can close IE, or drag/drop another file."
  ie.RegisterAsDropTarget = True
End Sub

Sub IE_OnQuit()
  WScript.Quit
End Sub

Sub RenameFile(sDir, sOld, sNew)
  invalids = Array(":", "\", "/", "*", "?", "<", ">", "|", """")
  For Each c in invalids
    sNew = Replace(sNew, c, "")
  Next 'c
  sNew = Trim(sNew)

  ext = "." & fso.GetExtensionName(sOld)
  If fso.FileExists(sDir & sNew & ext) Then
    cnt = 1
    Do While fso.FileExists(sDir & sNew & " (" & cnt & ")" & ext)
      cnt = cnt + 1
    Loop
    sNew = sNew & " (" & cnt & ")"
  End If
  fso.GetFile(sDir & sOld).Name = sNew & ext
End Sub

How To Ask Questions The Smart Way

message edited by Razor2.3


Report •

#8
May 20, 2014 at 05:50:31
Razor, this is amazing. Thank you so much for all the work you put into this.

At first I was having some issues with it, but I realized that I was using files on desktop where I don't have admin permissions. So I moved everything, and it's perfect.

Thank you again,

Matt

Can I put a question here?


Report •

Ask Question