Discussion:
Newbie with quick (hopefully) query
(too old to reply)
Nan
2005-01-04 02:10:54 UTC
Permalink
Here's my situation: I am a editorial assistant for a scientific
journal, that sends its issues to a website/search engine for scientic
journals. In the past, this website has been turning our pdfs into
SGML files, at a huge expense to us. So I decided to try to do it
myself. They gave me the complete files from one of our past issues
(SGML, images files, etc) and their DTD (which opens in MSWord: is
that what its supposed to open in?), plus some instructions.
Using the old files as a guide, and since I already knew HTML, it
wasn't hard to type in the sgml. My file looks exactly like the ones
they sent us. I used Dreamweaver to save it as a .sgml file, but I
don't really know if that was necessary.
I then asked the website organization if I could send them a test
'manuscript', and was given a very confusing response, suggesting if I
made any mistakes, they would charge me a huge amount of money.

So I need to make sure my sgml files are not flawed, or it defeats the
whole purpose of doing this. I don't have a 'parser', and frankly, I
don't know how to get one, or for that matter what the heck it is! Is
that something I could purchase to check my work before I sent it off?
How do I get one, and what one do I use? And then once I have it,
what do I do with the DTD in order to check it?

I'd appreciate any help- this originally didn't seem very hard, and I
am still very confident that I can do it and save the society I work
for some money. I'd certainly like the oppurtunity to try.

thanks in advance
Peter N. M. Hansteen
2005-01-04 09:14:41 UTC
Permalink
Post by Nan
Here's my situation: I am a editorial assistant for a scientific
journal, that sends its issues to a website/search engine for scientic
journals. In the past, this website has been turning our pdfs into
SGML files, at a huge expense to us.
Converting from a purely presentation oriented format such as PDF to
semantic markup such as SGML sounds like a very labor intensive
process. Maybe the journal accepts submissions in a large number of
formats, or possibly as unstructured Word files, which amounts to the
same thing really. Depending on the specific requirements, this could be
anyhing from moderately difficult to extremely complex.
Post by Nan
So I decided to try to do it myself. They gave me the complete files
from one of our past issues (SGML, images files, etc) and their DTD
(which opens in MSWord: is that what its supposed to open in?), plus
some instructions.
DTD files are text, but not a kind of text file Word would know how to
deal with by itself. It is rather likely that opening a typical DTD file
with Word and saving it would add Word formatting, which would make the
file useless. Using Word in this context is not safe.
Post by Nan
Using the old files as a guide, and since I already knew HTML, it
wasn't hard to type in the sgml. My file looks exactly like the ones
they sent us. I used Dreamweaver to save it as a .sgml file, but I
don't really know if that was necessary.
Unless Dreamweaver is able to make use of the supplied DTD to validate
your file, its ability to save your work to a file with an .sgml
extension does not mean a lot. Dreamweaver gurus will know for sure, but
I would not trust a web authoring program to do this correctly.
Post by Nan
I then asked the website organization if I could send them a test
'manuscript', and was given a very confusing response, suggesting if I
made any mistakes, they would charge me a huge amount of money.
So I need to make sure my sgml files are not flawed, or it defeats the
whole purpose of doing this. I don't have a 'parser', and frankly, I
don't know how to get one, or for that matter what the heck it is! Is
that something I could purchase to check my work before I sent it off?
How do I get one, and what one do I use? And then once I have it,
what do I do with the DTD in order to check it?
There are quite a few free software tools available for working with
SGML data, along with a few proprietary ones. However you would need
some general understanding of the processes and concepts to achieve
useful results. The general concepts (among others what a parser is an
why it is useful) are IIRC presented quite well in Norm Walsh' guide to
the DocBook DTD (DocBook is a rather popular DTD for technical
documentation such as software manuals), which can be read online at
http://docbook.org/tdg/en/index.html. I suggest you start reading part I
of http://docbook.org/tdg/en/html/docbook.html for an overview.

There are several other good online resources available, among others
Markus Hoenicka's combined tutorial and installation guide (where you
set up a Windows machine with free SGML/XML tools as you go along) at
http://ourworld.compuserve.com/homepages/hoenicka_markus/ntsgml.html
Note that some of the download links may refer to out of date versions -
in those cases, a more recent version is usually available from the same
directory. Finally, the slides for one of my presentations is at
http://www.bgnett.no/~peter/docbook-presentation/book1.html (really
intended to be accompanied by some spoken commentary), where you might
find the references section to be useful.
--
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://www.blug.linux.no/rfc1149/ http://www.datadok.no/ http://www.nuug.no/
"First, we kill all the spammers" The Usenet Bard, "Twice-forwarded tales"
Peter Flynn
2005-01-08 22:02:16 UTC
Permalink
Post by Nan
Here's my situation: I am a editorial assistant for a scientific
journal, that sends its issues to a website/search engine for scientic
journals. In the past, this website has been turning our pdfs into
SGML files, at a huge expense to us.
I'm sure it is. This is going backwards, like trying to turn hamburgers
back into whole cows.
Post by Nan
So I decided to try to do it
myself. They gave me the complete files from one of our past issues
(SGML, images files, etc) and their DTD (which opens in MSWord: is
that what its supposed to open in?),
Only to view it. To make use of it you need an SGML editor. Two you
could try are

Emacs
(Windows version: http://www.gnu.org/software/emacs/windows/ntemacs.html)
with psgml-mode (http://www.lysator.liu.se/projects/about_psgml.html) and
the nsgmls parser (http://www.jclark.com/sp/);

epcEdit
(http://www.epcedit.com).

Emacs is a plaintext-mode editor (you see all the pointy brackets);
epcEdit is typographical. There are lots of others, but Emacs is free,
and epcEdit has a timed free period.
Post by Nan
plus some instructions.
Using the old files as a guide, and since I already knew HTML, it
wasn't hard to type in the sgml. My file looks exactly like the ones
they sent us. I used Dreamweaver to save it as a .sgml file, but I
don't really know if that was necessary.
That may have damaged it. Dreamweaver doesn't know anything about SGML.
Post by Nan
I then asked the website organization if I could send them a test
'manuscript', and was given a very confusing response, suggesting if I
made any mistakes, they would charge me a huge amount of money.
That was naughty of them, but understandable. SGML has rules which *must*
be obeyed (set out in the DTD). Any departure from them renders the file
invalid and useless until corrected.
Post by Nan
So I need to make sure my sgml files are not flawed, or it defeats the
whole purpose of doing this.
Absolutely.
Post by Nan
I don't have a 'parser', and frankly, I
don't know how to get one, or for that matter what the heck it is! Is
that something I could purchase to check my work before I sent it off?
nsgmls is free.
Post by Nan
How do I get one, and what one do I use? And then once I have it,
what do I do with the DTD in order to check it?
You run the parser and give your filename to parse.
The parse reads the <!DOCTYPE declaration and finds the DTD.
It then checks that all your document markup conforms to the
rules in the DTD.
Post by Nan
I'd appreciate any help- this originally didn't seem very hard, and I
am still very confident that I can do it and save the society I work
for some money. I'd certainly like the oppurtunity to try.
It's not hard, but it's not wordprocessing. My book on SGML and XML
tools[1] describes the whole procedure in detail.

///Peter
1. Understanding SGML and XML Tools, Kluwer, 1998. 0-7923-8169-6
--
"The cat in the box is both a wave and a particle"
-- Terry Pratchett, introducing quantum physics in _The Authentic Cat_
Loading...