loiterer2
2008-04-25 21:09:57 UTC
Hi,
I would like to --well, at least, I am hoping I can-- do 2 things:
First, write a DTD parser. Then, concuct a DTD parser FAQ from a hands-
on point of view.
I have written a number of parsers, but they were all easy. SGML as in
used in DTDs is proving to be much harder.
The fact that there's hardly any usable information on the Web does
not make things any easier. By 'usable', I mean stuff you can lookup
and it tells you what is what in a language you can understand. From
this POV, books I have seen (not many) have been way over my head.
All I need is simle, example-oriented explanations. And, no, 'read the
code stupid' does not help either.
So, I decided to ask here --hoping people would help creating such
documentation.
While still hoping, I'll list a few examples for 'ENTITY' definition:
<!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT" -- repeatable
head elements -->
<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!ENTITY % HTML.Frameset "IGNORE">
<!ENTITY % list "UL | OL">
<!ENTITY % MediaDesc "CDATA" -- single or comma-separated list of
media descriptors -->
<!ENTITY % preformatted "PRE">
Now, it's obvious that what follows '<!' is what we are defining. In
this case an 'ENTITY'.
Then.. we have a '%' sign...
I am assuming that it tells us that we are about to find a name
string.
I don't remember seeing any 'ENTITY' definitions that did not have '%'
as the next non-whitespace char.
So, I am assuming that '%' must be present.
Is that a correct assumotion?
If not, what else can there be, and what do they mean?
After ''%'' char, next, we have a piece of non-whitespace string.
I am assuming it means 'name' of the 'ENTITY' we are defining.
Is that assumption correct, could there be something else meaning
something else.
And, is it case-sensitive --I believe it isn't but I might as well
have it confirmed.
Then, we have all sorts of goobledygook..
I am assuming these to be the value(s) that ENTITY can have.
I don't have much problem with those that are explicetly listed, but
what does "IGNORE", "CDATA" mean?
What other stuff can be there apart from "IGNORE", "CDATA", and what
do they mean?
Could you help clarify these please.
[I'll come back with others :) ]
I would like to --well, at least, I am hoping I can-- do 2 things:
First, write a DTD parser. Then, concuct a DTD parser FAQ from a hands-
on point of view.
I have written a number of parsers, but they were all easy. SGML as in
used in DTDs is proving to be much harder.
The fact that there's hardly any usable information on the Web does
not make things any easier. By 'usable', I mean stuff you can lookup
and it tells you what is what in a language you can understand. From
this POV, books I have seen (not many) have been way over my head.
All I need is simle, example-oriented explanations. And, no, 'read the
code stupid' does not help either.
So, I decided to ask here --hoping people would help creating such
documentation.
While still hoping, I'll list a few examples for 'ENTITY' definition:
<!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT" -- repeatable
head elements -->
<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!ENTITY % HTML.Frameset "IGNORE">
<!ENTITY % list "UL | OL">
<!ENTITY % MediaDesc "CDATA" -- single or comma-separated list of
media descriptors -->
<!ENTITY % preformatted "PRE">
Now, it's obvious that what follows '<!' is what we are defining. In
this case an 'ENTITY'.
Then.. we have a '%' sign...
I am assuming that it tells us that we are about to find a name
string.
I don't remember seeing any 'ENTITY' definitions that did not have '%'
as the next non-whitespace char.
So, I am assuming that '%' must be present.
Is that a correct assumotion?
If not, what else can there be, and what do they mean?
After ''%'' char, next, we have a piece of non-whitespace string.
I am assuming it means 'name' of the 'ENTITY' we are defining.
Is that assumption correct, could there be something else meaning
something else.
And, is it case-sensitive --I believe it isn't but I might as well
have it confirmed.
Then, we have all sorts of goobledygook..
I am assuming these to be the value(s) that ENTITY can have.
I don't have much problem with those that are explicetly listed, but
what does "IGNORE", "CDATA" mean?
What other stuff can be there apart from "IGNORE", "CDATA", and what
do they mean?
Could you help clarify these please.
[I'll come back with others :) ]