Computing.Net > Forums > Web Development > Interpreting DTDs

Computer Problems? Computing.Net has over 1,000,000 posts about all things technology related! Over 90% answered within 24 hours! Click here to start participating now! Also, be sure to check out the New User Guide.

Interpreting DTDs

Reply to Message Icon

Name: reticuli
Date: July 23, 2004 at 03:49:31 Pacific
OS: n/a
CPU/Ram: n/a
Comment:

I've been looking at the HTML 4.01 Loose DTD and have a few questions regarding interpretation, specifically with regard to the BODY element.

BODY is defined as:

<!ELEMENT BODY O O (%flow;)* +(INS|DEL)

which essentially expands to

<!ELEMENT BODY O O (P|H1|H2|H3|H4|H5|H6|UL|OL|DIR|MENU|PRE|DL|DIV|CENTER|NOSCRIPT|NOFRAMES|BLOCKQUOTE|FORM|ISINDEX|HR|TABLE|FIELDSET|ADDRESS|#PCDATA|TT|I|B|U|S|STRIKE|BIG|SMALL|EM|STRONG|DFN|CODE|SAMP|KBD|VAR|CITE|ABBR|ACRONYM|A|IMG|APPLET|OBJECT|FONT|BASEFONT|BR|SCRIPT|MAP|Q|SUB|SUP|SPAN|BDO|IFRAME|INPUT|SELECT|TEXTAREA|LABEL|BUTTON)* +(INS|DEL)

First the "O O" directly after BODY means the opening tag is optional and the ending tag is optional (see http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.3.3). Why are these defined as optional when the A tag (<!ELEMENT A - -) has mandatory opening and closing tags? Surely it's just as mandatory for the body of a document to be surrounded by opening and closing BODY tags?

Second, the * at the end of the first set of brackets supposedly means that any of the elements contained in that set of brackets may occur zero or more times. What I don't understand is the +(INS|DEL) addition.

The HTML documentation states this means that either INS or DEL may occur, but I don't see anything stating how many times they may occur. Going by the rules defined in the page at http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.3.3.1 I can only presume this means either INS or DEL may occur but no more than once, and never both. Surely this is wrong?

Shouldn't this really have been defined in the DTD as

<!ELEMENT BODY - - (%flow;|INS|DEL)*

I see the same problem with HEAD defined as

<!ELEMENT HEAD O O (TITLE & ISINDEX? & BASE?) +(SCRIPT|STYLE|META|LINK|OBJECT)

Can anyone specifically with practical experience of implementing/interpreting ACTUAL DTD's provide help here?

Thanks!



Sponsored Link
Ads by Google

Response Number 1
Name: anonproxy
Date: July 23, 2004 at 10:20:14 Pacific
Reply:

Remember, when you say loose HTML, you basically mean your parser is really more of an interpreter and will have to make decisions about HTML markup (unless impractical correctness is your goal). In other words, the implementation will have to decide what it is going to make of bad/incomplete markup.

"Why are these defined as optional when the A tag..."

Because the body tag is largely unnecessary to rendering the document (it's just an entry point for display - more important as an element than a tag). The anchor tag is critical to HTML - HTML is all about links. Therefore, delimiting links is basic. The presence of the anchor tag also affects display and the user interface. Lastly, HTML Transitional is a little strange.

Remember this is a DTD and not an implementation. That means it is not necessarily practical or realworld, just a ruleset.

"What I don't understand is the +(INS|DEL)..."

I would interpret that statement as: The element INS and/or the element DEL may occur multiple, but both are optional. I know the "|" operator supposedly means "either, but not both", but there's no HTML DTD that doesn't support both that I know of. It seems wrong and in almost any application, I doubt this kind of adherence would matter.

Admittedly, without an explicit syntax example, I can only guess. The W3C's validator supports 4.01 Loose as Transitional so a quick test would suffice. I just passed a document with two instances of both INS and DEL and no errors were returned.

"Shouldn't this really have been defined in the DTD as..."

I can't reconcile the validator's output with the DTD definitions. However, I believe your modified example would match the validator's interpretation.

"I see the same problem with HEAD defined as..."

This example pretty much settles the issue in my mind. Clearly having META and SCRIPT nested in HEAD are legal, for example. The syntax +(A|B) must mean "A and/or B are allowed multiple times, but their inclusion is optional."


0
Reply to Message Icon

Related Posts

See More







Post Locked

This post is quite old and has been locked from receiving new replies. Please create a new posting instead.


Go to Web Development Forum Home


Sponsored links

Ads by Google


Results for: Interpreting DTDs

Alternative to frames www.computing.net/answers/webdevel/alternative-to-frames/801.html

cant access webpage outside lan ..? www.computing.net/answers/webdevel/cant-access-webpage-outside-lan-/3045.html

W3C Validator www.computing.net/answers/webdevel/w3c-validator/2321.html