Discussion:
Help: 'ELEMENT' definition
(too old to reply)
loiterer2
2008-04-25 21:09:57 UTC
Permalink
Hi,

I would like to --well, at least, I am hoping I can-- do 2 things:

First, write a DTD parser. Then, concuct a DTD parser FAQ from a hands-
on point of view.

I have written a number of parsers, but they were all easy. SGML as in
used in DTDs is proving to be much harder.

The fact that there's hardly any usable information on the Web does
not make things any easier. By 'usable', I mean stuff you can lookup
and it tells you what is what in a language you can understand. From
this POV, books I have seen (not many) have been way over my head.

All I need is simle, example-oriented explanations. And, no, 'read the
code stupid' does not help either.

So, I decided to ask here --hoping people would help creating such
documentation.

While still hoping, I'll list a few examples for 'ENTITY' definition:

<!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT" -- repeatable
head elements -->
<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!ENTITY % HTML.Frameset "IGNORE">
<!ENTITY % list "UL | OL">
<!ENTITY % MediaDesc "CDATA" -- single or comma-separated list of
media descriptors -->
<!ENTITY % preformatted "PRE">

Now, it's obvious that what follows '<!' is what we are defining. In
this case an 'ENTITY'.

Then.. we have a '%' sign...

I am assuming that it tells us that we are about to find a name
string.

I don't remember seeing any 'ENTITY' definitions that did not have '%'
as the next non-whitespace char.
So, I am assuming that '%' must be present.
Is that a correct assumotion?
If not, what else can there be, and what do they mean?

After ''%'' char, next, we have a piece of non-whitespace string.

I am assuming it means 'name' of the 'ENTITY' we are defining.

Is that assumption correct, could there be something else meaning
something else.
And, is it case-sensitive --I believe it isn't but I might as well
have it confirmed.

Then, we have all sorts of goobledygook..

I am assuming these to be the value(s) that ENTITY can have.

I don't have much problem with those that are explicetly listed, but
what does "IGNORE", "CDATA" mean?

What other stuff can be there apart from "IGNORE", "CDATA", and what
do they mean?

Could you help clarify these please.

[I'll come back with others :) ]
Peter Flynn
2008-04-25 23:32:10 UTC
Permalink
Post by loiterer2
Hi,
First, write a DTD parser. Then, concuct a DTD parser FAQ from a hands-
on point of view.
I have written a number of parsers, but they were all easy. SGML as in
used in DTDs is proving to be much harder.
That is called Declaration Syntax (as opposed to Document Syntax).
It's not hard, per se, just different.
Post by loiterer2
The fact that there's hardly any usable information on the Web does
not make things any easier. By 'usable', I mean stuff you can lookup
and it tells you what is what in a language you can understand. From
this POV, books I have seen (not many) have been way over my head.
ISO 8859 (the standard document) is a commercial product of the ISO.
You have to buy it, or buy Goldfarb's _SGML Handbook_.
Post by loiterer2
All I need is simle, example-oriented explanations. And, no, 'read the
code stupid' does not help either.
The best guide to writing DTDs is "SGML DTDs" by Maler and El Andaloussi.
Post by loiterer2
So, I decided to ask here --hoping people would help creating such
documentation.
<!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT" -- repeatable
head elements -->
<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!ENTITY % HTML.Frameset "IGNORE">
<!ENTITY % list "UL | OL">
<!ENTITY % MediaDesc "CDATA" -- single or comma-separated list of
media descriptors -->
<!ENTITY % preformatted "PRE">
Now, it's obvious that what follows '<!' is what we are defining. In
this case an 'ENTITY'.
Yes, it's called the MDO (Markup Declaration Open).
Post by loiterer2
Then.. we have a '%' sign...
The Parameter Entity Reference Open (pero).
Post by loiterer2
I am assuming that it tells us that we are about to find a name
string.
Not quite. It defines that the name being declared is a PE (Parameter
Entity -- one that can be used only in replacements in the DTD) as
opposed to a General Entity (which is used in the actual document).
Post by loiterer2
I don't remember seeing any 'ENTITY' definitions that did not have '%'
as the next non-whitespace char.
That's because the only ones you have seen are PEs. Here are some
General Entities:

<!ENTITY IBM CDATA "International Business Machines">
<!ENTITY foobar SYSTEM "chapter1.sgm">

You use them in the text to refer to &IBM; or to include &foobar;
Post by loiterer2
So, I am assuming that '%' must be present.
Is that a correct assumotion?
No. See pp 394-401 of Goldfarb, especially Productions 101-104.
Post by loiterer2
If not, what else can there be, and what do they mean?
The pero is only used for PEs. GEs don't have a symbol there, but they
may use the reserved string #DEFAULT (production 103).
Post by loiterer2
After ''%'' char, next, we have a piece of non-whitespace string.
The entity name.
Post by loiterer2
I am assuming it means 'name' of the 'ENTITY' we are defining.
Yep.
Post by loiterer2
Is that assumption correct, could there be something else meaning
something else.
Nope.
Post by loiterer2
And, is it case-sensitive --I believe it isn't but I might as well
have it confirmed.
This is defined in the SGML Declaration for the specific DTD. It can be
made case-sensitive or case-insensitive.
Post by loiterer2
Then, we have all sorts of goobledygook..
This is the entity text. In the case of PEs, this is usually a content
model fragment, consisting of element type names in the form used in
element declarations, allowing the parameter entity reference to be usd
in constructing complex content models. But it can also be a parameter
literal and a bunch of other things (like the HTML.Frameset value, used
in switching features on and off).
Post by loiterer2
I am assuming these to be the value(s) that ENTITY can have.
No, you will have to read the standard to find out. It's 650pp.
Post by loiterer2
I don't have much problem with those that are explicetly listed, but
what does "IGNORE", "CDATA" mean?
Too much to explain here. Read Eve Maler's book.
Post by loiterer2
What other stuff can be there apart from "IGNORE", "CDATA", and what
do they mean?
Lots and lots.
Post by loiterer2
Could you help clarify these please.
Could you please go and read the documentation first, then ask about
what more you need to know.

///Peter

Loading...