Solved Batch file - read html tags and output to file

August 30, 2012 at 08:06:29
Specs: Windows XP
Hello - I need to read the text between <title> and </title> tags in multiple .html files in a file directory and output this text to a new text file. The number and names of the .html files will vary, so I need to be prepared for that. I am hoping this can be done with a plain batch file that can run under Windows XP.

Any help on this problem is much appreciated.


See More: Batch file - read html tags and output to file

Report •

✔ Best Answer
September 3, 2012 at 18:43:19
this might work, but only if
<title>
and
</title>
are both on the same physical line.
::===begin script
@echo off & setlocal enabledelayedexpansion
for /f "tokens=*" %%a in ('findstr /i /c:"<title>" *.htm') do (
set x="%%a"
:: you might get away with using a single delim char, but i tried to "safety" with 3.
set x=!x:title^>=*@$!
for /f "tokens=2 delims=*@$" %%b in (!x!) do (
set y="%%b"
set y=!y:~1,-3!
echo FINAL OUTPUT is: !y!
>>titles echo !y!
)
)
::=== end
ps: most of us have dealt with hospital procedures, and these are mild compared to some. Ce la vie!


#1
August 31, 2012 at 02:27:09
Post your htm.


====================================
Life is too important to be taken seriously.

M2


Report •

#2
August 31, 2012 at 07:13:00
All of the files will be generated by a system in generally the following format:

My goal is to read the text between <title> and </title> for each one.

<html>
<head><link rel='stylesheet' TYPE='text/css' HREF='lnp.css'/><title>Catheter irrigation</title></head>
<body><span class='section'>Catheter irrigation</span>
<span class='section'>
<img class='block' src='http://procedures.lww.com/css/images/block.gif' alt='block'/>Introduction</span><div>
<div class="Introduction">

To avoid introducing microorganisms into the bladder, the nurse irrigates an indwelling catheter to remove an obstruction, such as a blood clot, <span style="font-size: 12pt; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"><span style="font-family: andale mono,times;"><span style="font-size: small;">mucus, or sediment, and  to maintain unobstructed urine flow</span></span></span>. In some cases, the nurse may instill a medication that works directly on the bladder wall. Whenever possible, the catheter should be irrigated through a closed system (using the aspiration port) to decrease the risk of infection.</p></div></div><span class='section'>
<img class='block' src='http://procedures.lww.com/css/images/block.gif' alt='block'/>Equipment</span><div>
<div class="Equipment"><ul class="disc"><li>Ordered irrigating solution (such as normal saline solution)</li><li>Sterile basin</li><li>30- to 60-ml syringe</li><li>18G blunt-end needle (if system not needleless)</li><li>Two alcohol pads</li><li>Gloves</li><li>Linen-saver pad</li><li>Intake-output sheet</li><li>Clamp</li></ul>

Commercially packaged kits containing sterile irrigating solution, a graduated receptacle, and a 50-ml catheter tip syringe may be available. <span style="font-size: 12pt; font-family: "Times New Roman"; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">.<span style="font-family: Times New Roman;"><span style="mso-spacerun: yes;">  </span>Most kits will contain an adaptor which can be placed on the end of the syringe tip for accessing the aspiration port. </span><span style="mso-spacerun: yes;"><span style="font-family: Times New Roman;"> </span></span></span></p></div></div><span class='section'>
<img class='block' src='http://procedures.lww.com/css/images/block.gif' alt='block'/>Preparation of Equipment</span><div>
<div class="Preparation of Equipment">

Check the expiration date on the irrigating solution. To prevent vesical spasms during instillation of solution, warm it to room temperature. Never heat the solution on a burner or in a microwave oven. Hot irrigating solution can injure the patient's bladder.</p> </div></div><span class='section'>
<img class='block' src='http://procedures.lww.com/css/images/block.gif' alt='block'/>Implementation</span><div>
<ul class="disc"><li class="MsoNormal" style="background: white; margin: 7.5pt 0in; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Tahoma;">A physician’s order is required for catheter irrigation.<span style="mso-spacerun: yes;">  </span>The order should include type of irrigant, amount to be irrigated, and frequency of irrigation.</span></li><li>Confirm the patient's identity using two patient identifiers according to your facility's policy.</li><li>Wash your hands, and assemble the equipment at the bedside. Explain the procedure to the patient, and provide privacy.</li><li>Put on gloves.</li><li>Expose the catheter's aspiration port and place a linen-saver pad under it <em>to protect the bed linens.</em></li><li>Create a sterile field at the patient's bedside. Using aseptic technique, pour the prescribed amount of solution into the basin.</li><li class="MsoNormal" style="background: white; margin: 7.5pt 0in; text-align: left; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Tahoma;">Attach the luer-lock adaptor to the end of the syringe.<span style="mso-spacerun: yes;">  </span>(Pull the piston out of the barrel and attach the syringe tip to the luer-lock adaptor.<span style="mso-spacerun: yes;">  </span>Secure well.<span style="mso-spacerun: yes;">  </span>Replace the piston into the barrel of the syringe.)<span style="mso-spacerun: yes;">  </span>Maintain sterility of equipment. </span></li><li class="MsoNormal" style="background: white; margin: 7.5pt 0in; text-align: left; mso-list: l0 level1 lfo1; tab-stops: list .5in;">Place the tip of the syringe into the solution and fill the syringe with the appropriate amount.</li></ul><p class="center"><img class="border" src="../css/images/b3200.jpg" alt="Filling the syringe" /></p><ul class="disc"><li>Scrub the aspiration port with an alcohol pad for 20 seconds <em>to remove as many bacterial contaminants as possible.</em></li></ul><p class="center"><img class="border" src="../css/images/b3201.jpg" alt="Cleaning the port" /></p><ul class="disc"><li>Clamp the catheter tubing below the aspiration port.</li></ul><p class="center"><img class="border" src="../css/images/b3202.jpg" alt="Clamping the tubing" /></p><ul class="disc"><li>Attach the syringe to the port, or insert the blunt-tip needle into the port if a needleless system isn't in place.</li><li>Instill the irrigating solution into the catheter. If necessary, refill the syringe and repeat this step until you've instilled the prescribed amount of irrigating solution.</li><li>Remove the syringe and unclamp the drainage tube <em>to allow the irrigant and urine to flow into the drainage bags.</em></li><li>Make sure the catheter tubing is secured to the patient's leg and that the drainage bag is below the level of the bladder.</li><li>Dispose of all used supplies properly.</li></ul></div><span class='section'>
<img class='block' src='http://procedures.lww.com/css/images/block.gif' alt='block'/>Special Considerations</span><div>
<div class="Special Considerations"><ul class="disc"><li>If you encounter any resistance during instillation of the irrigating solution, don't try to force the solution into the bladder. Instead, stop the procedure and notify: nurse clinician, manager, house supervisor, then doctor. If an indwelling catheter becomes totally obstructed, obtain an order to remove it and replace it with a new one <em>to prevent bladder distention, acute renal failure, urinary stasis, and subsequent infection.</em></li><li>The doctor may order a continuous irrigation system. <em>This decreases the risk of infection by eliminating the need to disconnect the catheter and drainage tube repeatedly.</em> (See the "Bladder irrigation, continuous" procedure.)</li><li>Encourage catheterized patients not on restricted fluid intake to increase intake to 3,000 ml per day <em>to help flush the urinary system and reduce sediment formation.</em></li></ul></div></div><span class='section'>
<img class='block' src='http://procedures.lww.com/css/images/block.gif' alt='block'/>Documentation</span><div>
<div class="Documentation">

<span style="font-size: 12pt; font-family: "Times New Roman"; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"><span style="font-family: andale mono,times;"><span style="font-size: small;">Docoument the date and time the irrigation was performed and </span></span><span style="font-family: Times New Roman;">the type and amount of irrigant used.</span><span style="mso-spacerun: yes;"><span style="font-family: Times New Roman;">  </span></span></span></p>

Note the amount, color, and consistency of return urine flow, and document the patient's tolerance for the procedure. Also note any resistance during instillation of the solution. If the return flow volume is less than the amount of solution instilled, note this on the intake and output balance sheets and in your notes.</p></div></div><span class='section'>
<img class='block' src='http://procedures.lww.com/css/images/block.gif' alt='block'/>References</span><div>
<div class="References"><p class="ListParagraphCxSpFirst" style="margin: 0in 0in 0pt; mso-add-space: auto;"><p class="ListParagraphCxSpLast" style="margin: 0in 0in 0pt; mso-add-space: auto;"><span style="font-size: 12pt; background: yellow; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: 'Times New Roman'; mso-highlight: yellow;"></span></p><p class="ListParagraph" style="margin: 0in 0in 0pt; mso-add-space: auto;"><span style="font-size: 12pt; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: 'Times New Roman';"><span style="color: #000000;"> </span></span></p></p><p class="ListParagraphCxSpLast" style="margin: 0in 0in 0pt; mso-add-space: auto;"><span style="font-size: 12pt; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: 'Times New Roman';"><span style="color: #000000;">American Heart Association (2006). <em style="mso-bidi-font-style: normal;">Pediatric Advanced Life Support: Provider Manual.</em> M. Ralston, M. F. Hazinski, A. L. Zaritsky, S. M. Schexnayder, & M. E. Kleinman (Eds.), Channing Bete CO. South Deerfield, MA. </span></span></p>

Association for Professionals in Infection Control and Epidemiology (APIC): Guide to the Elimination of Catheterr-Associated Urinary Tract Infections (CAUTIs). Retrieved June 1, 2009 from <span style="font-size: 12pt; color: blue; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: Arial; mso-bidi-font-size: 10.0pt;"><a title="http://www.apic.org/Content/NavigationMenu/PracticeGuidance/APICEliminationGuides/CAUTI_Guide.pdf" href="http://www.apic.org/Content/NavigationMenu/PracticeGuidance/APICEliminationGuides/CAUTI_Guide.pdf">http://www.apic.org/Content/NavigationMenu/PracticeGuidance/APICEliminationGuides/CAUTI_Guide.pdf<span style="font-size: 12pt; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: 'Times New Roman';">Institute for Healthcare Improvement (IHI):<span style="mso-spacerun: yes;">  </span>Preventing Catheter-Associated Urinary Tract Infections. Retrieved June 1, 2009 from: </span><span style="font-size: 10pt; color: blue; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: Arial;">  </span><span style="font-size: 12pt; color: blue; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: Arial; mso-bidi-font-size: 10.0pt;"><a title="http://www.ihi.org/IHI/Programs/ImprovementMap/PreventCatheterAssociatedUrinaryTractInfections.htm" href="http://www.ihi.org/IHI/Programs/ImprovementMap/PreventCatheterAssociatedUrinaryTractInfections.htm">http://www.ihi.org/IHI/Programs/ImprovementMap/PreventCatheterAssociatedUrinaryTractInfections.htm</span><span style="font-size: 12pt; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: 'Times New Roman';"></span></span><span style="font-size: 12pt; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: 'Times New Roman';"></span></p><p class="ListParagraphCxSpMiddle" style="margin: 0in 0in 0pt; mso-add-space: auto;"><p class="ListParagraphCxSpLast" style="margin: 0in 0in 0pt; mso-add-space: auto;"><span style="font-size: 12pt; line-height: 115%; font-family: Tahoma; mso-bidi-font-family: 'Times New Roman';"><span style="color: #000000;"></span></span></p></p>

Lo, E., et al. (2008). Strategies to prevent catheter-associated urinary tract infections in acute care hospitals. <em>Infection control & hospital epidemiology, 29</em>(S1), S41-50.</p>

Rew, M., "Caring for Catheterized Patients: Urinary Catheter Maintenance," <em>British Journal of Nursing</em> 14(2):87-92, January-February 2005.</p>

Taylor, C., et al. <em>Fundamentals of Nursing: The Art and Science of Nursing Care,</em> 6th ed. Philadelphia: Lippincott Williams & Wilkins, 2008.</p></div></div></body></html>


Report •

#3
August 31, 2012 at 07:18:07
Sorry if the content is a little gross. This is for a hospital.

Report •

Related Solutions

#4
September 3, 2012 at 18:43:19
✔ Best Answer
this might work, but only if
<title>
and
</title>
are both on the same physical line.
::===begin script
@echo off & setlocal enabledelayedexpansion
for /f "tokens=*" %%a in ('findstr /i /c:"<title>" *.htm') do (
set x="%%a"
:: you might get away with using a single delim char, but i tried to "safety" with 3.
set x=!x:title^>=*@$!
for /f "tokens=2 delims=*@$" %%b in (!x!) do (
set y="%%b"
set y=!y:~1,-3!
echo FINAL OUTPUT is: !y!
>>titles echo !y!
)
)
::=== end
ps: most of us have dealt with hospital procedures, and these are mild compared to some. Ce la vie!

Report •

#5
September 4, 2012 at 07:09:38
Thanks, this works wonderfully! I appreciate your time in helping with this project!

Report •

#6
January 22, 2013 at 20:04:25
I was looking for something similar to this.

We have about 100 or so HTML files that we get from 10 different vendors routinely, (and can you belive none of them can do a data export...they are banks, so what do you expect but antiquated technology)

Anyway, we need to parse only three(3) variables from each HTML page, however, each HTML page has exactly 24 items. So the output for each HTML page would be 3 variables multiplied by 24 ietms = 72 total pieces of data per HTML page.

The variables are:
LoanProgramID (gov-programid), LoanProgramName(gov-programname), LoanProgramRatio(gov-programmaxratio)

The HTML files are in a single directory, all with unique file names each time, but all named *.html

We want to run through each file, read each HTML page, and extract all the data between the identifiers.

Sample data with 3 records in to (each with 3 variables:
(I trimmed this due to the HTML being rendered is not clean and very lengthy)
extra text gov-programid="U100785EMG" gov-programname="Fannie Mae Rennovation"gov-programmaxratio="70/10/10" extra text extra text gov-programid="U100787EMG" gov-programname="Fannie Mae Rennovation Rural"gov-programmaxratio="80/0/0" extra text extra text gov-programid="U100789EMG" gov-programname="Fannie Mae Community Program"gov-programmaxratio="75/15/5" extra text extra text

The only thing good about the HTMl files we want to parse, is that they all have one thing in commmon.
They have this exact pattern in them, and it all falls on one line in HTML (though it wraps on a screen)
gov-programid="*" gov-programname="*"gov-programmaxratio="*"

Is there away to parse multiple HTML files for multiple variables (three) that have multiple records (24) and output the results to a text/CSV file as comma seprated values?

I'm working on a few scripts, and also attempting some more advanced logic in VBA & MS Access 2013, but I feel that Batch should be able to do this easier than my method.

Any takers for a little guidance?


Report •

#7
January 22, 2013 at 22:32:51
Here might be something to start with. It will need to evolve - it is only a prototype:
::====== begin batch
@echo off & setlocal
for /f "tokens=*" %%a in ('find /i "gov-programid=" *.htm') do (
call :xx "%%a"
)
goto :eof

:xx
set z=%~1
set z=%z:"=@%
for /f "tokens=1-6 delims=@=" %%b in ("%z%") do (
if "%%c" neq "" (
echo %%b,%%c
echo %%d,%%e
echo %%f,%%g
)
)
::===== end batch
see if this even comes close, then maybe it can be made to work. It's not good code: I had to substitute @ for " in order to parse out the quotes using delims. vbscript is another option that I can also provide if this fails.


Report •

#8
January 23, 2013 at 07:49:37
Wow, thanks alot for getting back so quickly!
I tried this and it didn't seem to show any results.
I even added '>> outpu.txt' after the final close paren. ')'

I also tested just parsing for 'gov-' to see if any different results...and nothing changed.

FWIW, I am running the .bat file in the same directory as the .HTML files

the vbscript option also sounds like a good alternative, would you mind sharing a smaple of that?


Report •

#9
January 23, 2013 at 14:07:03
Ok, but first, are the files named xxx.htm or xxx.html? The extension needs to be right.
Next, put a couple of "echo" statements to see what's going (it seems like people always forget that they have this option!) Right after :xx label:
echo %1
right after the second for /f, look at %%c:
echo "%%c"
and let me know.
And, if you can, please either post or p-mail me a sample html file (with anything sensitive removed/altered, of course). I can use that to work with and debug my code (whether vbscript or batch).

Report •

#10
January 25, 2013 at 22:58:10
@nbrane, I think I have it figured out now.
I had to adjust a few oddball things to accommodate something for our need. I'll post the code later so others can use it.

One way was to extract/parse from the file
An alternate means to our goal was to edit the source file in such a way the system we have that scrubs the data can import it. We did this by editing the original HTML file using a VBS and then importing it to a database where some pre-existing logic extracts the needed feeds.

I just tested 150 files and it churded through them in about 3 minutes, which is way better than we need.

Now we can dump new HTML files from our vendors in one or more folders and have the Database pull them in and process. This exercise was a great lesson.

I learned that "YES YOU CAN" edit a text file form a script with search and replace functionality, and its simpel to do.

Code post to follow


Report •

#11
January 25, 2013 at 23:07:03
So we have this code saved as "replace.vbs"
-----------------------------
Const ForReading = 1
Const ForWriting = 2

strFileName = Wscript.Arguments(0)
'strOldText = Wscript.Arguments(1)
'strNewText = vbCRLF & Wscript.Arguments(2)


Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(strFileName, ForReading)

strText = objFile.ReadAll
objFile.Close
strNewText = Replace(strText, strOldText, strNewText)

Set objFile = objFSO.OpenTextFile(strFileName, ForWriting)
objFile.WriteLine strNewText
objFile.Close
-----------------------------

Put the .vbs file in the directory you are working with

Call this from a command line as: replace.vbs "FileName.ext" "SearchForTerm" "ReplaceWithTerm"

example: replace.vbs "TestFileName.html" "gov-programId=" "gov-programId="


"Our" example has the replace with add a line break too (on line 6 as the vbCRLF)
If you do not want the line break, remove that and the ampersand

With this script we were able to take poorly formatted HTML pages with nearly 110,000 characters on a line which made it hard to import. We found common identifiers and used Search and Replace to make line breaks in the HTML so it could be imported and then parsed using anotehr system we have that does intelligent data parsing.


Report •

#12
January 26, 2013 at 10:18:08
Yeh, I was going to modify the code I posted here for you to try, looks like you beat me to the punch:
http://www.computing.net/answers/pr...
And my "take" on it was just to grab those three items and pull them out:
dim x(2)
x(0)="gov-programid"
x(1)="gov-programname"
x(2)="gov-programmaxratio"
first, "readall" into var "htm", then:
for i=0 to 2
p=instr(lcase(htm),x(i))
p2=instr(p+1,htm,chr(34))
p3=instr(p2+1,htm,chr(34))
outfile.writeline x(i)&","&mid(htm,p2,p3-p2+1)
next
'====== end snippet
this assumes that the 3 items aren't always contiguous in the html.

Report •

Ask Question