Computing.Net > Forums > Programming > parse tree, what's it all about?

parse tree, what's it all about?

Reply to Message Icon

Original Message
Name: robber
Date: October 9, 2003 at 18:43:08 Pacific
Subject: parse tree, what's it all about?
OS: w98se
CPU/Ram: AMD K6-2,500mhz,256M
Comment:

i got my page parsed by w3c, found no errors, sent along a "parse tree". what's it for? how does one use this information? btw-i know it's a stupid question, but this is the forum where even dumb people can get a patient and helpful answer, thx inadv


Report Offensive Message For Removal


Response Number 1
Name: anonproxy
Date: October 9, 2003 at 20:21:17 Pacific
Reply: (edit)

Parsing is basically when a process (a parser) reads and interprets data. Generally the data is formatted, has a syntax, and maintains other issues of compliance or the parser returns an error/warning.

The particular parser we are talking about is an SGML (Standard Generalized Markup Language) parser. Now here is where it all gets fun. Markup languages, like HTML and XML, are not a new idea. Back in the 1970's a need arose for document structure - a way to not only describe the formatting of text, but even the meaning of text (ex. seperate format text from printed text), and also allow programs to interpret this document in a variety of ways - without having to imbed all those ways into the actual document.

Programmers term this:

"The principle of separating document description from application function..."

http://www.sgmlsource.com/history/roots.htm

SGML parsers look at a document like an engineer might look at a machine. Component pieces, their arrangement, their relationship, and then their meaning (generally in that order too). The parser reconstructs the document according to SGML standards. In this strict reconstruction, errors become very clear to the parser. Some people compare the document structure to a tree, and that's often how parser interpretations are expressed. This sort of analogy allows you to easily associate elements of the document to one another and the entire document.

HTML is a simplified DTD from SGML - a child markup language. An SGML parser should be able to handle HTML if the opening DTD line exists (citing the document at HTML), and the parser will hold the document exactly to the specification which the document cites.

A program that simply scans a document looking for something in particular (but not exhaustive nor strict) is called a scanner. Basically, if you write an incomplete parser you can call it a scanner - that way the parser writers won't go after you for tarnishing their term.

For more than you want to know about parsing, try this:

http://www.cs.vu.nl/~dick/PTAPG.html


Report Offensive Follow Up For Removal

Response Number 2
Name: robber
Date: October 9, 2003 at 20:29:58 Pacific
Reply: (edit)

useful and informative. again thanks! this is the only site on the web a person really needs for all around computing education. i always appreciate the surprisingly prompt and pertinent answers. hope i can help to contribute someday!


Report Offensive Follow Up For Removal







Use following form to reply to current message:

   Name: From My Computing.Net Settings
 E-Mail: From My Computing.Net Settings

Subject: parse tree, what's it all about?

Comments:

 


  Homepage URL (*): 
Homepage Title (*): 
         Image URL: 
 
Data Recovery Software




Have you ever used OpenOffice?

Yes, as my main suite.
Yes, occationally.
Yes, but only once.
No, never.


View Results

Poll Finishes In 5 Days.
Discuss in The Lounge