Discussion:
XML to SGML conversion
(too old to reply)
j***@gmail.com
2006-06-23 14:54:31 UTC
Permalink
Hi,

I'm currently working on a project where I am required to investigate
how to convert SGML to XML, and then back again.
From what I've seen on the web so far, James Clark's SP software can
convert SGML to XML, but thus far I cannot find anything that will go
the other way.

I realize that in converting SGML to XML I will lose a few things in
the conversion (as XML is a subset of SGML), so I'm not looking to
recreate the SGML 100%. I'm just looking for something that will
convert the XML back into valid SGML.

Is there anything out there that can do this?
William F Hammond
2006-06-23 17:10:56 UTC
Permalink
Post by j***@gmail.com
I'm currently working on a project where I am required to investigate
how to convert SGML to XML, and then back again.
. . .
I realize that in converting SGML to XML I will lose a few things in
the conversion (as XML is a subset of SGML), so I'm not looking to
recreate the SGML 100%. I'm just looking for something that will
convert the XML back into valid SGML.
Are there particular document types involved, or is this just an
academic question? If the latter, its statement is insufficient.

An XML instance that is valid under a DTD, as opposed to a schema,
is essentially the same thing as a valid SGML instance under the
same DTD.

-- Bill
Tad McClellan
2006-06-23 22:12:12 UTC
Permalink
Post by j***@gmail.com
I'm currently working on a project where I am required to investigate
how to convert SGML to XML, and then back again.
^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^

What does your project hope to gain by that?

I cannot conceive of any benefit, but maybe I'm just dense...
Post by j***@gmail.com
From what I've seen on the web so far, James Clark's SP software can
convert SGML to XML, but thus far I cannot find anything that will go
the other way.
Once you have made hamburger, it is hard to turn it back into steak.

:-)
--
Tad McClellan SGML consulting
***@augustmail.com Perl programming
Fort Worth, Texas
Peter Flynn
2006-06-23 23:47:19 UTC
Permalink
Post by j***@gmail.com
Hi,
I'm currently working on a project where I am required to investigate
how to convert SGML to XML, and then back again.
From what I've seen on the web so far, James Clark's SP software can
convert SGML to XML, but thus far I cannot find anything that will go
the other way.
I realize that in converting SGML to XML I will lose a few things in
the conversion (as XML is a subset of SGML), so I'm not looking to
recreate the SGML 100%. I'm just looking for something that will
convert the XML back into valid SGML.
If you want valid SGML back, it will have to be 100%. You can't do
this one by kludging.
Post by j***@gmail.com
Is there anything out there that can do this?
I have a script that I wrote to do this for some TEI documents that
were maintaines in SGML at the time, but needed temporarily in XML
where they might be edited, so round-tripping was required. I'll
see if I can dig it up, but essentially it involved

* running nsgmlnorm to ensure the SGML was normalized

* a little bit of awk to identify all element types and attributes
used

* look them up in an XML DTD and extract their (case-sensitive)
XML equivalents

* a sed script to turn all SGML element type names and attribute
names into the appropriate casing

* use Earl Hood's perlSGML to identify the EMPTY elements from
the DTD

* use that info to add the NET slash on EMPTY elements

* scan for non-ASCII characters and use a lookup to replace
with char ent refs (rare: most SGML users stick with char
ent refs anyway)

* then you have well-formed XML (strip off the DocType Decl)

* do whatever is needed. if you *must* have a DTD, use the
guidelines in the XML FAQ to create one (Seán McGrath's
rules at http://xml.silmaril.ie/developers/dtdconv/)

* afterwards, remove the slashes from the EMPTY elements

* and if necessary, turn any non-ASCII characters back into
char ent refs

* paste the DocType Decl back on and revalidate

This assumes the SGML uses the RCS. What it doesn't handle is
casing of ID/IDREF[S] values. You'll have to edit them by hand
if you have to process the XML with the DTD.

///Peter
j***@gmail.com
2006-06-24 14:10:58 UTC
Permalink
Hi,

Thanks for that info. If you can find that script that would be great,
otherwise I think you have provided me with enough info for me to take
a stab at it myself.

Thanks again.
Post by Peter Flynn
Post by j***@gmail.com
Hi,
I'm currently working on a project where I am required to investigate
how to convert SGML to XML, and then back again.
From what I've seen on the web so far, James Clark's SP software can
convert SGML to XML, but thus far I cannot find anything that will go
the other way.
I realize that in converting SGML to XML I will lose a few things in
the conversion (as XML is a subset of SGML), so I'm not looking to
recreate the SGML 100%. I'm just looking for something that will
convert the XML back into valid SGML.
If you want valid SGML back, it will have to be 100%. You can't do
this one by kludging.
Post by j***@gmail.com
Is there anything out there that can do this?
I have a script that I wrote to do this for some TEI documents that
were maintaines in SGML at the time, but needed temporarily in XML
where they might be edited, so round-tripping was required. I'll
see if I can dig it up, but essentially it involved
* running nsgmlnorm to ensure the SGML was normalized
* a little bit of awk to identify all element types and attributes
used
* look them up in an XML DTD and extract their (case-sensitive)
XML equivalents
* a sed script to turn all SGML element type names and attribute
names into the appropriate casing
* use Earl Hood's perlSGML to identify the EMPTY elements from
the DTD
* use that info to add the NET slash on EMPTY elements
* scan for non-ASCII characters and use a lookup to replace
with char ent refs (rare: most SGML users stick with char
ent refs anyway)
* then you have well-formed XML (strip off the DocType Decl)
* do whatever is needed. if you *must* have a DTD, use the
guidelines in the XML FAQ to create one (Seán McGrath's
rules at http://xml.silmaril.ie/developers/dtdconv/)
* afterwards, remove the slashes from the EMPTY elements
* and if necessary, turn any non-ASCII characters back into
char ent refs
* paste the DocType Decl back on and revalidate
This assumes the SGML uses the RCS. What it doesn't handle is
casing of ID/IDREF[S] values. You'll have to edit them by hand
if you have to process the XML with the DTD.
///Peter
Peter Flynn
2006-06-24 19:59:24 UTC
Permalink
Post by j***@gmail.com
Hi,
Thanks for that info. If you can find that script that would be great,
otherwise I think you have provided me with enough info for me to take
a stab at it myself.
Unfortunately the only executing copy immediately to hand is a
significantly different version written for a client which I can't
publish. Let me know if you get stuck...

///Peter

Loading...