Discussion:
SGML resources nowadays
(too old to reply)
bsm
2006-10-28 13:07:42 UTC
Permalink
Hi all,
I'm relatively new to sgml, so I'm stuck with writing my universal
markup parser. What I need is someont to point me to some kind of a
sgml specification (or at least a list of keywords it uses, I've got
the syntax summary - although not so very helpful), sgmldecl
explanation and the sgml catalog specification which is still there,
cause everything I found while googling is 8+ years old and offline
(sgmlopen, abortext, ...).

I would also appreciate if someone told me on which news server this
group is located, so I can acess it using Outlook Express, cause all I
found was "news:comp.text.sgml" and that means absolutely nothing to
OE.

TIA

PS: The application I'm writing will be GPL (yes, free plus written in
Emacs+GCC) and programming is a hobby to me, so you understand why I
can't buy the ridiculously priced ISO papers, so please lend me a hand,
or at least some links.
Benjamin Niemann
2006-10-28 13:51:04 UTC
Permalink
Hello,
Post by bsm
I'm relatively new to sgml, so I'm stuck with writing my universal
markup parser. What I need is someont to point me to some kind of a
sgml specification (or at least a list of keywords it uses, I've got
the syntax summary - although not so very helpful), sgmldecl
explanation and the sgml catalog specification which is still there,
cause everything I found while googling is 8+ years old and offline
(sgmlopen, abortext, ...).
PS: The application I'm writing will be GPL (yes, free plus written in
Emacs+GCC) and programming is a hobby to me, so you understand why I
can't buy the ridiculously priced ISO papers, so please lend me a hand,
or at least some links.
"The SGML Handbook" by Goldfarb contains the complete ISO standard. I got a
used copy via Amazon for EUR30, which I do not consider to be
a "ridiculously price" - though your opinion might differ.

For a first glance at the complexity of SGML, the grammar might be
sufficient to scare you off ;) You can find it at
<http://www.w3.org/MarkUp/SGML/productions.html>.
But there are a lot of subtletoes which you cannot read from the grammar, so
there is not real way around reading the standard itself.
I don't know of any public publications which are sufficiently detailed to
replace 'The Handbook' - and I have search quite a lot...


HTH
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
bsm
2006-10-29 10:06:37 UTC
Permalink
Thanks for the link, I didn't find that one while searching, it should
be interesting :)
William F Hammond
2006-10-31 20:57:17 UTC
Permalink
Post by Benjamin Niemann
Post by bsm
PS: The application I'm writing will be GPL (yes, free plus written in
Emacs+GCC) and programming is a hobby to me, so you understand why I
can't buy the ridiculously priced ISO papers, so please lend me a hand,
or at least some links.
To the OP: is there a particular purpose for which you wish to provide
a tool? IMO OpenSP would be hard to beat for what it has; it is a
mainly library for which there are not all that many GPL tools I know.
Post by Benjamin Niemann
"The SGML Handbook" by Goldfarb contains the complete ISO standard. I got a
used copy via Amazon for EUR30, which I do not consider to be
a "ridiculously price" - though your opinion might differ.
Yes.

Also reading the source code for OpenSP (find it chez OpenJade) might
be illuminating.

-- Bill
bsm
2006-11-01 21:54:03 UTC
Permalink
Post by William F Hammond
To the OP: is there a particular purpose for which you wish to provide
a tool? IMO OpenSP would be hard to beat for what it has; it is a
mainly library for which there are not all that many GPL tools I know.
Actually, I'm not interested in providing a tool or beating OpenSP,
what I want is to implement a whole SGML parser for a personal project,
thus I don't care how long it takes or how hard it is - I'll keep it up
as long as it's fun :) The only thing I'm missing is the documentation
(I'm familiar with some stuff, but in order to make the parser work I
need all of it), causethe specification is not free and I can't find
anything complete-enough online.

I'm thinking of getting that book, it can't hurt - actually, the only
thing stopping me from getting it right now is one of my own rules
(don't even ask me why I have these) - "Don't invest your money in
personal projects - you'll go bankrupt" <g>
Post by William F Hammond
Also reading the source code for OpenSP (find it chez OpenJade) might
be illuminating.
I am reading OpenSP source and trying to understand how the whole thing
works, but it's not going as fast as I'd expected because of some C++'s
annoyments. Don't get me wrong, I love C++, but it can even tire the
most persistent.
Peter Flynn
2006-11-02 00:00:14 UTC
Permalink
Post by bsm
Actually, I'm not interested in providing a tool or beating OpenSP,
what I want is to implement a whole SGML parser for a personal project,
thus I don't care how long it takes or how hard it is - I'll keep it up
as long as it's fun :) The only thing I'm missing is the documentation
(I'm familiar with some stuff, but in order to make the parser work I
need all of it), cause the specification is not free and I can't find
anything complete-enough online.
That's because it's not free...
Post by bsm
I'm thinking of getting that book, it can't hurt - actually, the only
thing stopping me from getting it right now is one of my own rules
(don't even ask me why I have these) - "Don't invest your money in
personal projects - you'll go bankrupt" <g>
If you're going to be staying in the markup community, a copy of
Goldfarb's book is a good reference anyway. For the traps and gotchas,
have a look for Steve DeRose's _The SGML FAQ Book_ (ITCP).

///Peter
Tin Gherdanarra
2006-12-04 20:41:44 UTC
Permalink
Post by bsm
I'm thinking of getting that book, it can't hurt - actually, the only
thing stopping me from getting it right now is one of my own rules
(don't even ask me why I have these) - "Don't invest your money in
personal projects - you'll go bankrupt" <g>
This is a harmful attitude. Pet project that don't require books are
virtually always boring. Those that DO require books require them
precisely for the reason that they can save you time by looking how
others have done it (most of which have more much more practice in
the field than you). You can advance the progress of your pet project
by going the following route:

- take a second job as a cleaning lady, or even better: night porter
- save the money you earn moonshining
- buy books, study them
- implement your pet just like the pros!

The time on the job will more than make up for the time you
would spend running messages. Inventing things for oneself is a
game of chance and pretty vulgar. You have people for this.
Besides, reading technical material is harder than thinking
and thus more manly. Inventing stuff is just that: Making things up.
Pipedreaming. That's why people buy pornography: Your own
imagination only gets you so far. Get yourself some books.
Goldfarb is mind-boggingly comprehensive, and you should most
certainly get it if you want to be an SGML-nerd, but I recommend
Parseme.1st
http://www.amazon.com/PARSEME-1st-Software-Developers-Sean-McGrath/dp/0134889673/sr=11-1/qid=1165264775/ref=sr_11_1/002-8999821-9642428
for starts.

Trevor Jenkins
2006-10-28 15:30:00 UTC
Permalink
... writing my universal markup parser. ...
PS: The application I'm writing will be GPL (yes, free plus written in
Emacs+GCC)
Why not settle for James Clark's SP parser with his sgml-mode for emacs?
Unless you're gong to implement the stuff he didn't such as DATATAG; a sad
omission of a potentialy useful feature in my opinion.
... and programming is a hobby to me, so you understand why I
can't buy the ridiculously priced ISO papers, so please lend me a hand,
or at least some links.
As others have or will do get yourself a copy of Goldfarb's SGML Handbook.
But before that ask yourself why most of the world has gone over to XML
and why ISO issued addena to ISO 8879 to include support for XML.

Regards, Trevor

<>< Re: deemed!
bsm
2006-10-29 10:12:35 UTC
Permalink
Post by Trevor Jenkins
Why not settle for James Clark's SP parser with his sgml-mode for emacs?
Unless you're gong to implement the stuff he didn't such as DATATAG; a sad
omission of a potentialy useful feature in my opinion.
I'm going for the complete parser, although OpenSP is a great app it
doesn't fit in with other stuff I'm planning + I don't like using 3rd
party libs in personal projects - I'm just like that :)
Post by Trevor Jenkins
As others have or will do get yourself a copy of Goldfarb's SGML Handbook.
But before that ask yourself why most of the world has gone over to XML
and why ISO issued addena to ISO 8879 to include support for XML.
OK, I'll try getting it. I already have an XML parser developed, it was
no challenge at all. XML is too easy and nowhere nere as general as
SGML ;)
Trevor Jenkins
2006-10-29 14:40:52 UTC
Permalink
Post by bsm
Post by Trevor Jenkins
Why not settle for James Clark's SP parser with his sgml-mode for emacs?
Unless you're gong to implement the stuff he didn't such as DATATAG; a sad
omission of a potentialy useful feature in my opinion.
I'm going for the complete parser, although OpenSP is a great app it
doesn't fit in with other stuff I'm planning + I don't like using 3rd
party libs in personal projects - I'm just like that :)
You plan to implement everything then? Including DATATAG? I know that
James Clark had good reasons for NOT implementing some of the features of
SGML.
Post by bsm
Post by Trevor Jenkins
As others have or will do get yourself a copy of Goldfarb's SGML Handbook.
But before that ask yourself why most of the world has gone over to XML
and why ISO issued addena to ISO 8879 to include support for XML.
OK, I'll try getting it. I already have an XML parser developed, it was
no challenge at all. XML is too easy and nowhere nere as general as
SGML ;)
I remember the very first version of the XML definition; it stated that a
parser should be implementable by a computing science student in a couple
of weeks. (Sadly I've long since lost that ISO committee paper.) On the
first occasion when I (a post-grad computing scientist) had to write a
parser it did indeed take two weeks but the customer's programming
language of choice was VAX BASIC.

There's a good reason why XML is easy. Direct application of Pareto's
Rule. The 80% of SGML that people really want to use can be done in the
20% of code that implements the XML subet. Though now eight years on
pretty much 99.9% of what people wanted SGML to provide can be provide
through XML. DATATAG, while it would have been useful with some of the
contract work I did, can be realised by preprocessing the document through
a macroprocessor such as ML/I, Stage2, or m4.

Regards, Trevor

<>< Re: deemed!
bsm
2006-11-01 21:58:30 UTC
Permalink
Post by Trevor Jenkins
You plan to implement everything then? Including DATATAG? I know that
James Clark had good reasons for NOT implementing some of the features of
SGML.
Yes, but as I said - it's a personal project.
Post by Trevor Jenkins
I remember the very first version of the XML definition; it stated that a
parser should be implementable by a computing science student in a couple
of weeks. (Sadly I've long since lost that ISO committee paper.) On the
first occasion when I (a post-grad computing scientist) had to write a
parser it did indeed take two weeks but the customer's programming
language of choice was VAX BASIC.
What's VAX BASIC? Well, anyway, I think it's very possible in even less
time :)
Post by Trevor Jenkins
There's a good reason why XML is easy. Direct application of Pareto's
Rule. The 80% of SGML that people really want to use can be done in the
20% of code that implements the XML subet. Though now eight years on
pretty much 99.9% of what people wanted SGML to provide can be provide
through XML. DATATAG, while it would have been useful with some of the
contract work I did, can be realised by preprocessing the document through
a macroprocessor such as ML/I, Stage2, or m4.
Yes, I know, I use XML for almost every project, but this is just for
me and just for fun, so I want it to be a challenge and a learning
experiance. Sounds dum? maybe not :)
Trevor Jenkins
2006-11-01 23:21:11 UTC
Permalink
Post by bsm
Post by Trevor Jenkins
You plan to implement everything then? Including DATATAG? I know that
James Clark had good reasons for NOT implementing some of the features of
SGML.
Yes, but as I said - it's a personal project.
For my own projects I want something working as quickly as possible.
Except when the project is itself a programming task.
Post by bsm
Post by Trevor Jenkins
I remember the very first version of the XML definition; it stated that a
parser should be implementable by a computing science student in a couple
of weeks. (Sadly I've long since lost that ISO committee paper.) On the
first occasion when I (a post-grad computing scientist) had to write a
parser it did indeed take two weeks but the customer's programming
language of choice was VAX BASIC.
What's VAX BASIC? Well, anyway, I think it's very possible in even less
time :)
Think BASIC but running on a Digital (subsequently Compaq now HP) VAX/VMS
system. Some refinements over, for example, the original Dartmouth BASIC
of the 1960s or Microsoft Basic of the 1980s. Features of VAX BASIC were
inherited from either RSTS or RSX Basic and also shared with perl. The
most notable being statement qualifiers. Unfortunatley the client site
where I was working at the time didn't allow use of such nifty syntatic
sugar, confused their staff, but they did permit the use of procedures and
functions and the VAX implementation encourage long names for
procedures/functions/variables rather than the Dartmouth or Microsoft
namespace of single letter with optional digit. I did write an object
oriented program in VAX BASIC for that contract --- rather like the
original SmallTalk interpreter was implemented in some variant of BASIC
--- but it confused the hell out of the maintenance team.

If writing that XML parser today I'd do it in python or ruby. Or better
yet just link in one of the many extant XML parsers and get on with other
parts of the project.
Post by bsm
Post by Trevor Jenkins
There's a good reason why XML is easy. Direct application of Pareto's
Rule. The 80% of SGML that people really want to use can be done in the
20% of code that implements the XML subet. Though now eight years on
pretty much 99.9% of what people wanted SGML to provide can be provide
through XML. DATATAG, while it would have been useful with some of the
contract work I did, can be realised by preprocessing the document through
a macroprocessor such as ML/I, Stage2, or m4.
Yes, I know, I use XML for almost every project, but this is just for
me and just for fun, so I want it to be a challenge and a learning
experiance. Sounds dum? maybe not :)
It's your project you can do what you like. But as I said if it were me
I'd want something working now so would just go with SP or jade. If really
purist then I'd get it working with SP and think about substitute
something of my own invention later but by that time I'd have got some
other private project on the go so would just stick with SP. Plus leaves
the maintenance task to the OpenSP team.

Regards, Trevor

<>< Re: deemed!
Peter Flynn
2006-11-02 00:01:06 UTC
Permalink
Post by bsm
What's VAX BASIC?
You don't want to know.
It was nowhere near as good a Pennsylvania Medical School BASIC :-)

///Peter
Peter Flynn
2006-10-29 00:04:55 UTC
Permalink
Post by bsm
Hi all,
I'm relatively new to sgml, so I'm stuck with writing my universal
markup parser. What I need is someont to point me to some kind of a
sgml specification (or at least a list of keywords it uses, I've got
the syntax summary - although not so very helpful), sgmldecl
explanation and the sgml catalog specification which is still there,
cause everything I found while googling is 8+ years old and offline
(sgmlopen, abortext, ...).
Others have pointed you at the relevant resources.

The reason it's 8+ years old is that most applications have moved to
XML. Is there some reason why you feel you need to use SGML, or it is
just for completeness in your parser?
Post by bsm
I would also appreciate if someone told me on which news server this
group is located, so I can acess it using Outlook Express, cause all I
found was "news:comp.text.sgml" and that means absolutely nothing to
OE.
Newsgroups aren't "located" on any particular server. The distribution
mechanism means any NNTP server can serve any group. What you need to
do is find out what the address of your own ISP's news server is (if
they have one), and whether they take a feed of comp.text.sgml (ask
them to do so if they don't). The OE interface is extremely sucky: far
better to get a real newsreader like Forté Agent or similar (for a list,
see http://www.newsreaders.com/win/clients.html).

///Peter
bsm
2006-10-29 10:21:18 UTC
Permalink
Post by Peter Flynn
The reason it's 8+ years old is that most applications have moved to
XML. Is there some reason why you feel you need to use SGML, or it is
just for completeness in your parser?
Nope, it's just for the completeness of the parser - I want it to be
able to digest everything. My starting point were the W3C DTDs (all
HTMLs, MathML, SVG, ...), I learned alot from those, but then I found
the SGML syntax summary page and found out there were numerous features
that weren't used in those, so now I'm trying to implement all those
features I newer saw used.
Post by Peter Flynn
Newsgroups aren't "located" on any particular server. The distribution
mechanism means any NNTP server can serve any group. What you need to
do is find out what the address of your own ISP's news server is (if
they have one), and whether they take a feed of comp.text.sgml (ask
them to do so if they don't). The OE interface is extremely sucky: far
better to get a real newsreader like Forté Agent or similar (for a list,
see http://www.newsreaders.com/win/clients.html).
Oh, then it's my mistake, I thought all newsgroups were like e.g.
Borlands, you get a news server (e.g. newsgroups.borland.com) and a
group name and you can acess it easily through sucky programs ;) No, my
ISP doesn't have an NNTP server, but google groups will do.
William F Hammond
2006-11-01 15:27:19 UTC
Permalink
Post by Peter Flynn
The reason it's 8+ years old is that most applications have moved to
XML. Is there some reason why you feel you need to use SGML, or it is
just for completeness in your parser?
In the context of documents on the web, there now appears to be
reconsideration of the XML form of HTML. See, for example, this blog
entry on an MIT server by timbl,

http://dig.csail.mit.edu/breadcrumbs/node/166

where one finds:

Some things are clearer with hindsight of several years.
It is necessary to evolve HTML incrementally. The attempt to
get the world to switch to XML, including quotes around
attribute values and slashes in empty tags and namespaces
all at once didn't work. The large HTML-generating public
did not move, largely because the browsers didn't complain.

Another reference is "whatwg":

http://www.whatwg.org/

The present XML standard (version 1) aside, what is unclear in all of
this is whether these discussions will lead to a new form of HTML that
can be formalized as SGML. I certainly think having an SGML umbrella
for HTML is a reasonable and important goal, but I sense a risk that
it could be lost.

On a brighter note, inasmuch as XML, version 1, was brought forth to
serve as SGML for the web, I suppose one thing that could emerge might
be next-generation XML, e.g., XML, version 2.

-- Bill
Tad McClellan
2006-11-01 19:40:46 UTC
Permalink
Post by William F Hammond
will lead to a new form of HTML that
can be formalized as SGML.
Huh?

HTML was formalized as SGML from its inception. (broken "tools" aside.)
--
Tad McClellan SGML consulting
***@augustmail.com Perl programming
Fort Worth, Texas
William F Hammond
2006-11-02 17:47:38 UTC
Permalink
Post by Tad McClellan
Post by William F Hammond
will lead to a new form of HTML that
can be formalized as SGML.
Huh?
HTML was formalized as SGML from its inception. (broken "tools" aside.)
Yes, of course, though only from version 2. At this point I sense a
risk that it will be sacrificed in various ways in the name of browser
inter-operability, not that I think such a sacrifice to be any of
wise, desirable, or helpful.

Imagine, for example, "web language" with element-by-element rules
about which specific cdata string attributes must be quoted.

Another example: "web language" with no document type declaration
though maybe an attribute on the root element to give browsers an idea
of what is at hand. Then imagine named cdata -- maybe sdata would be
better -- entities in such documents provided by authors who are led
to expect that a browser will have internal knowledge of the names.
E.g., in a firefox installation, check out res/entityTables.

-- Bill
Continue reading on narkive:
Loading...