Arvin Portlock
2004-08-15 18:31:57 UTC
I'm using nsgmls v. 1.3.4 to process both SGML and XML files
which is why I'm not using an XML parser. I notice XML validators
catch certain illegal characters while nsgmls does not. For
example both RXP and Xerces object to 0xc0 but nsgmls does not,
even with -wxml:
RXP:
Error: Input error: Illegal UTF-8 start byte <0xc0> at file
offset 2618 in unnamed entity at line 44 char 8 of
file:///C:/PROGRA~1/rxp/badchar.xml
Is there a way to configure nsgmls, say by modifying the
SGML declaration, so that it will catch these illegal character
errors? My sample file is here:
http://www.geocities.com/_ratty/badchar.xml
(Geocities may munge this up so it may not be useable).
Arvin
which is why I'm not using an XML parser. I notice XML validators
catch certain illegal characters while nsgmls does not. For
example both RXP and Xerces object to 0xc0 but nsgmls does not,
even with -wxml:
RXP:
Error: Input error: Illegal UTF-8 start byte <0xc0> at file
offset 2618 in unnamed entity at line 44 char 8 of
file:///C:/PROGRA~1/rxp/badchar.xml
Is there a way to configure nsgmls, say by modifying the
SGML declaration, so that it will catch these illegal character
errors? My sample file is here:
http://www.geocities.com/_ratty/badchar.xml
(Geocities may munge this up so it may not be useable).
Arvin