Discussion:
nsgmls, html4.01 and hexadecimal char references like "Å"
(too old to reply)
Rodericus
2012-06-28 14:51:26 UTC
Permalink
Running "nsgmls -s tmp.htm", where the file "tmp.htm" contains:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" >
<HTML><HEAD><TITLE>Muster</TITLE></HEAD><BODY><P>
&#x00C5;
</BODY></HTML>

leads to the error:

nsgmls:tmp.html:3:2:E: "X00C5" is not a function name

The problem seems to be, that SGML seems only to accept numerical
character references with decimal numbers like "&#229;",
but not hexadecimal with "&#x".

Do someone know a solution to parse such a html file with nsgmls?
I neet it to process the file.

Thanks
Rod.
Peter Flynn
2012-10-03 21:26:11 UTC
Permalink
You must use onsgmls instead of nsgmls. See openjade.sourceforge.net
Post by Rodericus
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" >
<HTML><HEAD><TITLE>Muster</TITLE></HEAD><BODY><P>
&#x00C5;
</BODY></HTML>
nsgmls:tmp.html:3:2:E: "X00C5" is not a function name
The problem seems to be, that SGML seems only to accept numerical
character references with decimal numbers like "&#229;",
but not hexadecimal with "&#x".
Unless otherwise specified, that is correct. ISO 8879:1986 does not
define a hexadecimal format for numerical character references. They are
technically non-SGML characters in the Reference Concrete Syntax, which
is what you are using when you don't specify anything else.
Post by Rodericus
Do someone know a solution to parse such a html file with nsgmls?
I neet it to process the file.
You must specify the SGML Declaration for HTML4, eg

$ onsgmls -e -g -s -u sgmlhtml.dec test.html

(Google for a copy if you don't have one). This lets onsgmls use the
SGML TC2 (WebSGML) Adaptations, which includes some new delimiters,
including HRCO, which allows hexadecimal character references.

///Peter

Loading...