End-tag omission. Bug in OpenSP?

Discussion:

(too old to reply)

Benjamin Niemann

2006-07-28 10:09:28 UTC

Hello,

before I file a bug report, I want to see, if my understanding of the SGML
standard is correct.

According to the SGML Handbook, the omitted tag minimization parameters
should not influence how the document is parsed (e.g. which tags are
implied).

But a simple test document (see below) shows that onsgmls (1.5.2) does make
a difference here (just checked nsgmls 1.3.4 - same result).

Am I correct that the end tag for A1 must be implied before the start tag
for A2 (as it is done for B1)?

---- the test document ----
<!DOCTYPE document [

<!ELEMENT document - - (a,b)>

<!ELEMENT a - - (a1|a2)+>
<!ELEMENT (a1,a2) - - (#PCDATA)>

<!ELEMENT b - - (b1|b2)+>
<!ELEMENT (b1,b2) - O (#PCDATA)>

]>
<document>

<a><a1>foo<a2>bar</a2></a>




<b><b1>foo<b2>bar</b2></b>



</document>
---- EOF ----

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/

Joe English

2006-07-31 14:29:25 UTC

Permalink

Post by Benjamin Niemann
before I file a bug report, I want to see, if my understanding of the SGML
standard is correct.
According to the SGML Handbook, the omitted tag minimization parameters
should not influence how the document is parsed (e.g. which tags are
implied).

That's right. End-tag and start-tag inference is (sort of)
a form of error recovery: the omitted tag minimization parameters
are only examined when the parser encounters a tag or character
data that isn't allowed at that point in the document.

Post by Benjamin Niemann
But a simple test document (see below) shows that onsgmls (1.5.2) does make
a difference here (just checked nsgmls 1.3.4 - same result).
Am I correct that the end tag for A1 must be implied before the start tag
for A2 (as it is done for B1)?

Not necessarily. The test document is invalid, so the Standard
doesn't specify what the parser must report to the application
(other than an error diagnostic).

Post by Benjamin Niemann
---- the test document ----
<!DOCTYPE document [
<!ELEMENT document - - (a,b)>
<!ELEMENT a - - (a1|a2)+>
<!ELEMENT (a1,a2) - - (#PCDATA)>
<!ELEMENT b - - (b1|b2)+>
<!ELEMENT (b1,b2) - O (#PCDATA)>
]>
<document>
<a><a1>foo<a2>bar</a2></a>
<b><b1>foo<b2>bar</b2></b>
</document>
---- EOF ----

--Joe English

Benjamin Niemann

2006-08-01 08:23:39 UTC

Permalink

Post by Joe English

Post by Benjamin Niemann
According to the SGML Handbook, the omitted tag minimization parameters
should not influence how the document is parsed (e.g. which tags are
implied).

Even after the omitted tag has been inferred, as I understand it.

Post by Joe English

Post by Benjamin Niemann
But a simple test document (see below) shows that onsgmls (1.5.2) does
make a difference here (just checked nsgmls 1.3.4 - same result).
Am I correct that the end tag for A1 must be implied before the start tag
for A2 (as it is done for B1)?

Not necessarily. The test document is invalid, so the Standard
doesn't specify what the parser must report to the application
(other than an error diagnostic).

I agree. But Goldfarbs annotation

'However, an SGML parser applies the rules of ommision in 7.3.1 without
consideration of what the document type definition says about tag omission
for any element' (SGML Handbook, Clause 7.3.1)

seems to suggest that such a reportable error are a bit different from other
markup errors, because the document can be parsed correctly without any
form of error recovery (I'm not counting inserting omitted tags as error
recovery here) which would be undefined by the standard.

I admit that I'm a bit pedantic now ;)

Post by Joe English

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/