Discussion:
Fallback HTML validation
(too old to reply)
Spartanicus
2007-01-10 22:34:54 UTC
Permalink
I'd like to configure Liam Quinn's "A Real Validator" to validate
documents against the HTML 4.01 Strict DTD for documents that use
<!DOCTYPE html>. Afaics the "html.soc" configuration file used by ARV
already facilitates a fallback by validating against the HTML 4.01
transitional DTD if it doesn't recognize a document's doctype:

-----
OVERRIDE YES

[list of common identifiers]

SGMLDECL HTML4.dcl

OVERRIDE NO

DOCTYPE html html401-loose.dtd
-----

Alas when using <!DOCTYPE html> ARV reports: "no internal or external
document type declaration subset; will parse without validation".

Anyone know how this can be achieved?
--
Spartanicus
David Håsäther
2007-01-10 23:00:17 UTC
Permalink
Post by Spartanicus
I'd like to configure Liam Quinn's "A Real Validator" to validate
documents against the HTML 4.01 Strict DTD for documents that use
<!DOCTYPE html>. Afaics the "html.soc" configuration file used by ARV
already facilitates a fallback by validating against the HTML 4.01
-----
OVERRIDE YES
[list of common identifiers]
SGMLDECL HTML4.dcl
OVERRIDE NO
DOCTYPE html html401-loose.dtd
-----
Alas when using <!DOCTYPE html> ARV reports: "no internal or external
document type declaration subset; will parse without validation".
I've never used ARV, but it sounds like it's using SP. I've also
encountered this problem in the past, and as far as I know there is no
workaround.

You'll have to use <!doctype html system> I think.
--
David Håsäther
Spartanicus
2007-01-10 23:53:26 UTC
Permalink
Post by David Håsäther
I've never used ARV, but it sounds like it's using SP. I've also
encountered this problem in the past, and as far as I know there is no
workaround.
Afaik the W3C online validator is also based on OpenSP, it's web
interface has a doctype override option. If automatic fallback isn't
possible then forcing validation using a specified doctype would also
work fine for me.
Post by David Håsäther
You'll have to use <!doctype html system> I think.
I remember getting automatic fallback to work when I tried that some
time ago, but I'd prefer to omit the system identifier.
--
Spartanicus
Benjamin Niemann
2007-01-11 08:53:03 UTC
Permalink
Post by Spartanicus
Post by David Håsäther
I've never used ARV, but it sounds like it's using SP. I've also
encountered this problem in the past, and as far as I know there is no
workaround.
Afaik the W3C online validator is also based on OpenSP, it's web
interface has a doctype override option. If automatic fallback isn't
possible then forcing validation using a specified doctype would also
work fine for me.
If you enable "View Source" when doctype override is active, you'll see that
the old doctype declaration is commented out and a new one inserted. So
this is probably not some magic of the sgml processor - you'll have to
patch the document before passing it to (Open)SP.
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
William F Hammond
2007-01-12 16:28:42 UTC
Permalink
Post by Spartanicus
I'd like to configure Liam Quinn's "A Real Validator" to validate
documents against the HTML 4.01 Strict DTD for documents that use
<!DOCTYPE html>.
. . .
Anyone know how this can be achieved?
In my view an instance beginning with <!DOCTYPE html> (no formal
public identifier nor system identifier) is not correct and
should be corrected.

I take a different view toward instances that simply begin with the
root element, i.e., <html>, even though the W3C definitions of the
various versions of classical HTML have _required_ a correct document
type definition.

In this case for the purpose of validation with OpenSP one can from a
declaration-less instance assemble a correct document on standard
input for onsgmls -s by putting the document type declaration of one's
choice in the stdin stream immediately before the <html>. Of course,
OpenSP will need to be able to find both the DTD and the SGML
declaration (found in classic HTML definitions from W3C). So the
combination of the constructed input stream and the OpenSP command
line, possibly also with environmental variables, must provide, one
way or another, for this or it will fail. As a practical matter,
therefore, one will want to equip one's self (once in a lifetime) with
a shell script for doing it.

Going back to the <!DOCTYPE html> case: if you want to allow it, then
for validation modulo the declaration error the process above can be
modified to use a stream editor (like "sed") to substitute your
correct document type declaration for that incorrect declaration.

-- Bill
Spartanicus
2007-01-12 18:07:20 UTC
Permalink
Post by William F Hammond
In my view an instance beginning with <!DOCTYPE html> (no formal
public identifier nor system identifier) is not correct and
should be corrected.
It will cause a document to be flagged as "invalid" when referenced
against the W3C specs, but that verdict has no practical relevance.
Apart from DTD validators, no UA that I'm aware of pays any attention to
the doctype declaration for parsing the HTML. Only the value of the HTTP
"Content-Type" header matters, if it is "text/html" then that is how it
will be parsed.

I'd prefer to omit the doctype entirely. Sadly due to the introduction
by MS of different browser rendering modes triggered by the absence or
presence of certain doctype declaration formats this is not an option.
If I want to avoid triggering quirks mode in browsers then <!DOCTYPE
html> is the minimum I can get away with.

This is part of the philosophy behind the WhatWG's HTML5 / Web
Applications 1.0 initiative. IMO their initiative offers the best way
forward for HTML.

Also part of that philosophy is to abandon the use of a DTD and use
Relax NG and Schematron based tools to check for compliance with the
spec's full prose instead.

For me validating against a custom DTD and SGML declaration provides a
check that sufficiently guards against parsing problems in current day
UAs. Since I use a local validator to do that check I thought I could
configure it to validate using <!DOCTYPE html>. Alas it seems that SP
doesn't want to play ball, so for now I resorted to using <!DOCTYPE html
SYSTEM> instead which SP can be configured to validate.
Post by William F Hammond
In this case for the purpose of validation with OpenSP one can from a
declaration-less instance assemble a correct document on standard
input for onsgmls -s by putting the document type declaration of one's
choice in the stdin stream immediately before the <html>. Of course,
OpenSP will need to be able to find both the DTD and the SGML
declaration (found in classic HTML definitions from W3C). So the
combination of the constructed input stream and the OpenSP command
line, possibly also with environmental variables, must provide, one
way or another, for this or it will fail. As a practical matter,
therefore, one will want to equip one's self (once in a lifetime) with
a shell script for doing it.
I'm not sufficiently capable with scripting languages to use OpenSP
directly, so I am using the Windows GUI application ARV, which although
it uses OpenSP cannot be used as a command line application in
combination with StdIn.
--
Spartanicus
Loading...