How to extract data from HTML tags

February 23, 2009 at 05:55:05
Specs: Windows XP, Intel Core 2 Duo
Hi,
I have an HTML file as input and I need to extract the data available between the HTML tags.

For example:

<tr bgcolor="#FFFFFF" align="right">
<td >Name: Sandy</td>
<td >Roll No.: 43234</td>
<td >Subject: DS</td>
</tr>

Now, I need to read the Name, Roll No. and Subject fields from the above table (HTML file). So, please let me know, is there any option to do this using HTML or JavaScript etc? So, that I can store this data in variables and reuse this later.

Thanks & Regards,
Sandeepan

Sandy!


See More: How to extract data from HTML tags

Report •


#1
February 23, 2009 at 08:44:40
html is static, but you can do that using js; however, it depends on whether or not the user had js enabled or the browser supported js.
<html>
<head>

<script language="javascript">

function show(id) {

   document.getElementById("d").innerHTML = document.getElementById(id).innerHTML;
}

</script>
<noscript>this page requires javascript, but it's currently disabled or unsupported</noscript>
</head>
<body>

<table>
<tr bgcolor="#FFFFFF" align="right">
<td id='td1' onmouseover='show(this.id);'>Name: Sandy</td>
<td id='td2' onmouseover='show(this.id);'>Roll No.: 43234</td>
<td id='td3' onmouseover='show(this.id);'>Subject: DS</td>
</tr>
</tr>
</table>
<div id='d'></div>
</body>
</html>

HTH


Report •

#2
February 23, 2009 at 14:12:56
Hi Shutat,

Thanks for reply!
Your answer is really good but my requirement is a bit different.In the above example you have changed the existing HTML file and then using JS, we have got the data between tags.
But I have the following requirement:
I have two files a.htm and b.htm. The content of a.htm are the above table with Name, Roll No. and Subject. Now I can't change the contents of a.htm but I want to store the Name, Roll No and Subject values in three different variables var1,var2 and var3 that are declared in file b.htm. This b.htm can also use the JavaScript or whatever is required but the input file a.htm would remain un-changed.
Hope you have got me point!

Regards,
Sandeepan

Sandy!


Report •

#3
February 25, 2009 at 11:17:34
If both files are pure html, then all I can think of is perhaps using an iframe in b.html that contains a.html. However, the code below will fail ("permission denied" or similar) if both a.html and b.html aren't in the same domain - including either being in different sub domains.

b.html

<html>
<head>

<script language="javascript">

var a=null, b=null, c=null;

function show(id) {

   var dElement = document.getElementById(id).contentWindow.document.body.getElementsByTagName("td");

   try {

      if(dElement.length == 3) {
         a = dElement[0].innerHTML.split(":");
         b = dElement[1].innerHTML.split(":");
         c = dElement[2].innerHTML.split(":");
      
         alert(a[0] + " is " + a[1] + '\n' +
               b[0] + " is " + b[1] + '\n' +
               c[0] + " is " + c[1]);
      }
   }

   catch(e) { alert(e); }
}

</script>
<noscript>this page requires javascript to function correctly, but it's unavialable in your browser</noscript>
</head>
<body onload="show('frm')">
<iframe id='frm' src='a.html'></iframe>
</body>
</html> 

I'm not familiar with xml or other flavors of exotic html. :P

HTH


Report •

Related Solutions

#4
February 26, 2009 at 04:10:29
Hi Shutat,

Thanks a lot for the solution!

Regards,
Sandy

Sandy!


Report •


Ask Question