Specialty Forums
Security and Virus
General Hardware
CPUs/Overclocking
Networking
Digital Photo/Video
Office Software
PC Gaming
Console Gaming
Programming
Database
Web Development
Digital Home

General Forums
Windows XP
Windows Vista
Windows 95/98
Windows Me
Windows NT
Windows 2000
Win Server 2008
Win Server 2003
Windows 3.1
Linux
PDAs
BeOS
Novell Netware
OpenVMS
Solaris
Disk Op. System
Unix
Mac
OS/2

Drivers
Driver Scan
Driver Forum

Software
Automatic Updates

BIOS Updates

My Computing.Net

Solution Center

Free IT eBook

Howtos

Site Search

Message Find

RSS Feeds

Install Guides

Data Recovery

About

Home
Reply to Message Icon Go to Main Page Icon

help: parsing html file using perl

Original Message
Name: esu (by Raj)
Date: June 12, 2007 at 14:03:05 Pacific
Subject: help: parsing html file using perl
OS: linux
CPU/Ram: 512
Model/Manufacturer: 64
Comment:
Hi All,

Can someone suggest the way to parse the html file to look for specific text using perl? or any programming language

I have abc.html file; which is result of one of automated test suite output. This result file contains status of all tests within suite. All I want to find test name and their status(pass/fail) from html file and redirect to another text file called xyz.txt. xyz should contain test name space seperate by status followed space seperated by description of test failure. The format of this text file is pasted below as xyz.txt.

==================================
abc.html file looks like this:
==================================
<html>
<body>
<table class="details" border="0" cellpadding="5" cellspacing="2" width="95%">
<tr valign="top">
<th width="80%">Name</th><th>Tests</th><th>Errors</th><th>Failures</th><th nowrap="nowrap">Time(s)</th>
</tr>
<tr valign="top" class="Error">
<td>TestSuite</td><td>34</td><td>1</td><td>1</td><td>321.625</td>

</tr>
</table>
<h2>Tests</h2>
<table class="details" border="0" cellpadding="5" cellspacing="2" width="95%">
<tr valign="top">
<th>Name</th><th>Status</th><th width="80%">Type</th><th nowrap="nowrap">Time(s)</th>
</tr>
<tr valign="top" class="TableRowColor">
<td>testLoginInitialLoad</td><td>Success</td><td></td><td>7.938</td>

</tr>
<tr valign="top" class="TableRowColor">
<td>testFailedSignOn</td><td>Success</td><td></td><td>7.156</td>
</tr>
<tr valign="top" class="TableRowColor">
<td>testLoginSignOn</td><td>Success</td><td></td><td>8.078</td>
</tr>
<td>testComponents</td><td>Failure</td><td>null



<code>junit.framework.AssertionFailedError: null at tests.Components.Components(ProductComponents.java:44)</code></td><td>7.625</td>
</tr>
<tr valign="top" class="TableRowColor">
<td>testProductColumnSort</td><td>Success</td><td></td><td>20.484</td>
</tr>

<tr valign="top" class="Error">
<td>testCompare</td><td>Error</td><td>Product missmatch: To compare , please select exactly two comp of the same product.



<code>tests.ProductMissmatchException: Product missmatch: To compare , please select exactly two comp of the same product. at tests.Compare.testCompare(Compare.java:45)</code></td><td>7.141</td>
</tr>
</body>
</html>
===========================================

xyz.txt
======
testLoginInitialLoad Success
testFailedSignOn Success
testLoginSignOn Success
testComponents Failure junit.framework.AssertionFailedError: null at tests.Components.Components(ProductComponents.java:44)
testCompare Error Product missmatch: To compare , please select exactly two comp of the same product. tests.ProductMissmatchException: Product missmatch: To compare , please select exactly two comp of the same product. at tests.Compare.testCompare(Compare.java:45)


Report Offensive Message For Removal


Response Number 1
Name: dmj2
Date: June 12, 2007 at 23:26:30 Pacific
Subject: help: parsing html file using perl
Reply: (edit)
Try this:

cat abc.html | sed -e 's/<td>/\%/g' -e 's/<[^>]*>//g' | egrep -v '^$' | tr "%" " " | egrep -i '(Suc|Fail|Err)'


Report Offensive Follow Up For Removal

Response Number 2
Name: esu (by Raj)
Date: June 13, 2007 at 11:49:11 Pacific
Subject: help: parsing html file using perl
Reply: (edit)
Great this is what I'm looking for. Thank you very much.

I want to create this as excutable so I can call from other program/script. Not sure why following snippet gives me error.

#!/usr/bin/perl -w

$File="0_abc.html";
cat $File|sed -e 's/<td>/\%/g'-e 's/<[^>]*>//g'|egrep -v '^$'|tr "%" " " |egrep -i '(Suc|Fail|Err)' >> test.txt



Report Offensive Follow Up For Removal

Response Number 3
Name: esu (by Raj)
Date: June 13, 2007 at 17:23:46 Pacific
Subject: help: parsing html file using perl
Reply: (edit)
Hi there,

There was typo in my script ...I corrected it and it worked well. However the script produces following output. The things which I don't want to see in out put are first line which is NameTestsErrorsFailuresTime(s).
Then there is white space in beginning of each line which not reuqired. Also there's extra white space after status and before time is printed on each line.

In short:
1)get rid of first line
2)remove while space from begining of each line
3) Remove extra white space after status and before time(s).
================================
this is current output from abouve script:
====================================
NameTestsErrorsFailuresTime(s)
testLoginInitialLoad Success 7.938
testFailedSignOn Success 7.156
testLoginSignOn Success 8.078
testHomeTabNone Success 16.469

==========================
we want following output:
===========================
testLoginInitialLoad Success 7.938
testFailedSignOn Success 7.156
testLoginSignOn Success 8.078
testHomeTabNone Success 16.469
testDefectProductComponents Failure null
testDefectRunsCompare Error Product missmatch: To compare runs, please select exactly two runs of the same product.


Report Offensive Follow Up For Removal

Response Number 4
Name: ghostdog
Date: June 15, 2007 at 23:17:14 Pacific
Subject: help: parsing html file using perl
Reply: (edit)
[code]
awk '/<td>/,/<\/td>/ { if ($0 ~ /TestSuite/) {next} ;
gsub("<td>|</td>"," ",$0)
gsub("<code>|</code>"," ",$0)
gsub("^ ","",$0)
print
}
' "file"
[/code]

Report Offensive Follow Up For Removal




Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: help: parsing html file using perl 

Comments:

 
  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 


Data Recovery Software




My PC has been hijacked!

Lexmark 2600 Printer Issues

btk1w1 infected start here post

Unwanted message remians on screen

Slow boot time


The information on Computing.Net is the opinions of its users. Such opinions may not be accurate and they are to be used at your own risk. Computing.Net cannot verify the validity of the statements made on this site. Computing.Net and Computing.Net, LLC hereby disclaim all responsibility and liability for the content of Computing.Net and its accuracy.
PLEASE READ THE FULL DISCLAIMER AND LEGAL TERMS BY CLICKING HERE

All content ©1996-2007 Computing.Net, LLC