Solved New at VB, can someone help?

April 10, 2015 at 07:12:08
Specs: Windows 7
I'm trying to extract data between a start and end html tag from a text file and output that data without the tags to a new file using VB. I have a 'template' started, but not sure what to do with it.


Dim tRead As System.IO.StreamReader

tRead = IO.File.OpenText(NBX.txt)

readstring = tRead.ReadLine()

Private Function Get_HTMLTag(ByVal TagName As String, ByVal HTML As String) As List(Of String)
Dim lMatch As New List(Of String) 'Get the results in a List of strings


Dim Tag As New Regex("(?<=<" & <NBX> & ">).*(?=<\/" & </NBX> & ">)", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
For Each rMatch As Match In Tag.Matches(HTML)
lMatch.Add (rMatch.Value)
Next

Return lMatch
End Function


writestring = "...."


IO.File.WriteAllText(output.txt, writestring, System.Text.Encoding.Default)

tRead.Close()


See More: New at VB, can someone help?

Report •


#1
April 11, 2015 at 12:38:59
✔ Best Answer
My regexp ability is "slim to none", so I can't quite decipher what your regexp is trying to do. It would help (me, anyway) if you posted a sample of the input html and a sample of the output you want to get from it. I'm not even sure whether you want the tags, or the between-tags (visible text). Also, you might want to back up and consider other methods than regexp. My attempts at using regex.replace using ">" and "<" pattern (like: "<.*> or ">.*<" failed any time there was more than one tag on a line, since regex went from the first to the very last on the line. For what it's worth, here's a vbscript extractor that uses the "split" function instead of regexp:
'==== begin vbscript
fil="test.htm"
set fso=createobject("scripting.filesystemobject")
x=fso.opentextfile(fil).readall
'-- the following line is to handle random line-breaks which can occur in html
x=replace(x,vbcrlf,"")
'-- now begin the splitting operation
z=split(x,">")
wscript.echo ubound(z)
oput=""
for i=0 to ubound(z)
'wscript.echo z(i)
' add a tag-end in case there's not one, to avoid error
a=split(z(i)+"<","<")
wscript.echo "string: "&a(0)
wscript.echo "tag: "&a(1)
'-- for tags, use 1 instead of zero in foll. line
oput=oput+a(0)+vbcrlf
next
fso.opentextfile("testout",2,true).write oput
'======= end vbscript

message edited by nbrane


Report •

#2
April 13, 2015 at 13:42:45
I tried running your script as a batch file. It didn't work for me for some reason. I'm trying to get the text that is in between the tags:

Sample html:

NBX>
<HEADERFIELDS>
<PDI>N</PDI>
<DAT>January 1, 2015</DAT>
<YMD>2015-01-01</YMD>
<PAP>Press of Atlantic City, The (NJ)</PAP>
<PBI>0C9C3566D46E3280</PBI>
<DSI>0EABABB8B3650A25</DSI>
<COP>Copyright, 2015, South Jersey Publishing Company t/a The Press of Atlantic City</COP>
<ROY>Press of Atlantic City, The (NJ)</ROY>
<UNQ>1529E35524B873A0</UNQ>
<SQN>15010218712198</SQN>
<EPP>1</EPP>
<WCT>33</WCT>
<EDT>All</EDT>
<PAG>D1</PAG>
<SEC>Sports</SEC>
<HED>Sports / Best of The Press online (Photo)</HED>
</HEADERFIELDS>
<TEXTFIELDS>
<GRC>Holy Spirit's John Middleton collides with Wildwood Catholic's George Cook during a game earlier this year. For a gallery of our best sports photography, go to <b>PressofAC.com/top10</b> - Staff photo by Ben Fogletto</GRC>
</TEXTFIELDS></NBX>

I tried the following code instead and I'm not receiving any output.

Public Function ReadTextFile(strPath As String) As String
Dim fso As New FileSystemObject
Dim ts As TextStream
Dim strOutput As String
Set ts = fso.OpenTextFile("c:\users\msan\documents\test\NBX1.txt")
Do Until ts.AtEndOfStream
strOutput = strOutput + ts.ReadLine
Loop

ts.Close
ReadTextFile = strOutput
End Function


Public Function getData(StartI, EndI As Long) As Long
Dim FullFile1, FullFile2, FullFile3, FullFile4, FullFile5, FullFile6 As Long
Dim x, y As Integer

y = 1

For x = 1 To 5000

StartI = InStr(y, strPath, "<UNQ>") + 5
EndI = InStr(y, strPath, "</UNQ>") - 1

FullFile1 = Mid(strPath, StartI, EndI - StartI)
y = EndI + 2
Next

For x = 1 To 5000
StartI = InStr(y, strPath, "<YMD>") + 5
EndI = InStr(y, strPath, "</YMD>") - 1

FullFile2 = Mid(strPath, StartI, EndI - StartI)
y = EndI + 2
Next

For x = 1 To 5000
StartI = InStr(y, strPath, "<PAP>") + 5
EndI = InStr(y, strPath, "</PAP>") - 1

FullFile3 = Mid(strPath, StartI, EndI - StartI)
y = EndI + 2
Next

For x = 1 To 5000
StartI = InStr(y, strPath, "<WSR>") + 5
EndI = InStr(y, strPath, "</WSR>") - 1

FullFile4 = Mid(strPath, StartI, EndI - StartI)
y = EndI + 2
Next

For x = 1 To 5000
StartI = InStr(y, strPath, "<HED>") + 5
EndI = InStr(y, strPath, "</HED>") - 1

FullFile5 = Mid(strPath, StartI, EndI - StartI)
y = EndI + 2
Next

For x = 1 To 5000
StartI = InStr(y, strPath, "<TEXTFIELDS>") + 12
EndI = InStr(y, strPath, "</TEXTFIELDS>") - 1

FullFile6 = Mid(strPath, StartI, EndI - StartI)
y = EndI + 2


Next

End Function

Public Function getOutput()
Dim filePath As String

filePath = "c:\users\msan\documents\test\output.txt"

Open filePath For Output As strOutput
Print #strOutput, "Story ID:" & nSpace & FullFile1 & vbNewLine
Print #strOutput, "Story Date:" & nSpace & FullFile2 & vbNewLine
Print #strOutput, "News Source:" & nSpace & nSpace & FullFile3 & vbNewLine
Print #strOutput, "Byline:" & nSpace & nSpace & FullFile4 & vbNewLine
Print #strOutput, "Headline:" & nSpace & FullFile5 & vbNewLine
Print #strOutput, "Story:" & vbNewLine & vbTab & FullFile6 & vbNewLine & vbNewLine
Close strOutput

End Function




Report •

#3
April 13, 2015 at 19:10:01
My apologies: the script I submitted was vbscript, not batch. I saved the vbscript textfile as "HTMEX.VBS". To run it from thcommand prompt:
CSCRIPT HTMEX.VBS
The input for my script is "test.htm" (id at line #2), and testout (last line).
Vbscript will never ever run as a batch script, but many vbscript functions and operations are portable to vis.basic.
The test I ran seemed to work ok...

Report •

Related Solutions

#4
April 14, 2015 at 07:32:24
That's awesome...it worked! Thank you so much!! Is there a way to specify certain tags only?

Report •

#5
April 14, 2015 at 07:33:22
Btw, I forgot my log in...I had to create a new one. Sorry.

Report •

#6
April 14, 2015 at 08:09:42
Actually, disregard my previous post. I'm good now. Thanks again!!

Report •

Ask Question