Discussion:
End-tag omission. Bug in OpenSP?
(too old to reply)
Benjamin Niemann
2006-07-28 10:09:28 UTC
Permalink
Hello,

before I file a bug report, I want to see, if my understanding of the SGML
standard is correct.

According to the SGML Handbook, the omitted tag minimization parameters
should not influence how the document is parsed (e.g. which tags are
implied).

But a simple test document (see below) shows that onsgmls (1.5.2) does make
a difference here (just checked nsgmls 1.3.4 - same result).

Am I correct that the end tag for A1 must be implied before the start tag
for A2 (as it is done for B1)?


---- the test document ----
<!DOCTYPE document [

<!ELEMENT document - - (a,b)>

<!ELEMENT a - - (a1|a2)+>
<!ELEMENT (a1,a2) - - (#PCDATA)>

<!ELEMENT b - - (b1|b2)+>
<!ELEMENT (b1,b2) - O (#PCDATA)>

]>
<document>
<!-- end tag for a1 is not optional but unambiguous -->
<a><a1>foo<a2>bar</a2></a>

<!-- output of onsgmls:
onsgmls:sgml-implicit-tags-04.sgml:16:15:E: document type does not allow
element "A2" here
onsgmls:sgml-implicit-tags-04.sgml:16:27:E: end tag for "A1" omitted, but
its declaration does not permit this
onsgmls:sgml-implicit-tags-04.sgml:16:5: start tag was here
(A
(A1
-foo
(A2
-bar
)A2
)A1
)A
-->

<!-- end tag for b1 is optional and unambiguous -->
<b><b1>foo<b2>bar</b2></b>

<!-- output of onsgmls:
(B
(B1
-foo
)B1
(B2
-bar
)B2
)B
-->

</document>
---- EOF ----
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
Joe English
2006-07-31 14:29:25 UTC
Permalink
Post by Benjamin Niemann
before I file a bug report, I want to see, if my understanding of the SGML
standard is correct.
According to the SGML Handbook, the omitted tag minimization parameters
should not influence how the document is parsed (e.g. which tags are
implied).
That's right. End-tag and start-tag inference is (sort of)
a form of error recovery: the omitted tag minimization parameters
are only examined when the parser encounters a tag or character
data that isn't allowed at that point in the document.
Post by Benjamin Niemann
But a simple test document (see below) shows that onsgmls (1.5.2) does make
a difference here (just checked nsgmls 1.3.4 - same result).
Am I correct that the end tag for A1 must be implied before the start tag
for A2 (as it is done for B1)?
Not necessarily. The test document is invalid, so the Standard
doesn't specify what the parser must report to the application
(other than an error diagnostic).
Post by Benjamin Niemann
---- the test document ----
<!DOCTYPE document [
<!ELEMENT document - - (a,b)>
<!ELEMENT a - - (a1|a2)+>
<!ELEMENT (a1,a2) - - (#PCDATA)>
<!ELEMENT b - - (b1|b2)+>
<!ELEMENT (b1,b2) - O (#PCDATA)>
]>
<document>
<a><a1>foo<a2>bar</a2></a>
<b><b1>foo<b2>bar</b2></b>
</document>
---- EOF ----
--Joe English
Benjamin Niemann
2006-08-01 08:23:39 UTC
Permalink
Post by Joe English
Post by Benjamin Niemann
According to the SGML Handbook, the omitted tag minimization parameters
should not influence how the document is parsed (e.g. which tags are
implied).
That's right. End-tag and start-tag inference is (sort of)
a form of error recovery: the omitted tag minimization parameters
are only examined when the parser encounters a tag or character
data that isn't allowed at that point in the document.
Even after the omitted tag has been inferred, as I understand it.
Post by Joe English
Post by Benjamin Niemann
But a simple test document (see below) shows that onsgmls (1.5.2) does
make a difference here (just checked nsgmls 1.3.4 - same result).
Am I correct that the end tag for A1 must be implied before the start tag
for A2 (as it is done for B1)?
Not necessarily. The test document is invalid, so the Standard
doesn't specify what the parser must report to the application
(other than an error diagnostic).
I agree. But Goldfarbs annotation

'However, an SGML parser applies the rules of ommision in 7.3.1 without
consideration of what the document type definition says about tag omission
for any element' (SGML Handbook, Clause 7.3.1)

seems to suggest that such a reportable error are a bit different from other
markup errors, because the document can be parsed correctly without any
form of error recovery (I'm not counting inserting omitted tags as error
recovery here) which would be undefined by the standard.

I admit that I'm a bit pedantic now ;)
Post by Joe English
Post by Benjamin Niemann
---- the test document ----
<!DOCTYPE document [
<!ELEMENT document - - (a,b)>
<!ELEMENT a - - (a1|a2)+>
<!ELEMENT (a1,a2) - - (#PCDATA)>
<!ELEMENT b - - (b1|b2)+>
<!ELEMENT (b1,b2) - O (#PCDATA)>
]>
<document>
<a><a1>foo<a2>bar</a2></a>
<b><b1>foo<b2>bar</b2></b>
</document>
---- EOF ----
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
Loading...