Post by Jany QuintardDoes anyone know if there exist a batch converter able to transform Word
files into SGML (most probably (X)HTML).
I can use OpenOffice to convert Word files into pretty clean HTML files,
but I need to process each file by hand, and I would like some automatic
process, ideally running under Unix/Linux or cygwin.
The only reliable commercial product I have used for this was EBT's
DynaTag (Windows GUI and batch, Unix only batch). This was sold when
EBT ceased trading, and the purchaser (Enigma) doesn't seem to understand
SGML or XML, and has no clue what the product is. I believe it is still
theoretically available from them, but it's barely mentioned on their site.
However, I also believe it is still available, in a newer form, from Red
Bridge Software, who operate from the same premises as EBT used to in
Providence, RI. Ask 'em.
It was based on the Rainbow DTD, designed as an exchange format between
wordprocessors, so that you could write a translation from Product X to
Rainbow, and then use someone else's product to convert *from* Rainbow to
their format. This never really happened, but I have a copy of the original
Rainbow software still -- probably illegally -- at
ftp://ftp.ucc.ie/pub/sgml/rainbow/
DynaTag displays your Word file and lets you map Word named styles to XML
output elements you make up. It can handle nesting, splitting, and
encapsulation, and produces a DTD to accompany what you export. Having done
this for a sample batch, you can then let it rip on a big folder-full. The
GUI for training is Windows only, but the specs could be used with their
Unix batch engine. Having got it into your homebrew but representative XML
format, you can then write an XSLT transformation to turn it into whatever
output you need.
Caveat: like most Word-to-XML converters, the input Word *must* use named
styles for its formatting. If everything is just marked "Normal" then you
have only the fonts to guide you, and your conversion quality will suffer.
///Peter