Discussion:
Reader for SGML/XML/TEI Lite
(too old to reply)
Eve M. Behr
2007-06-27 07:58:41 UTC
Permalink
I am a volunteer at the Distributed Proofreaders project (an offshoot
of the Gutenberg project). I have been noticing that a lot of their
stuff is being encoded using the TEI Lite standard.

Now my problem is not so much that I can't manage to produce code. I
have years of working with tagged languages like Scribe and LaTeX that
separate content and style. But each of those packages had a document
compiler integrated into the package. I got quite expert with
manipulating their stylesheets to produce acceptable looking
documents.

But this XML/SGML stuff defeats me. What I have been doing is using
Word macros to convert back and forth between HTML and TEI. In this
way I have produced over a hundred pages of baseball statistics. I
use the HTML so that I can proof the tables (make sure that they're
rectangular and the headings are spanning correctly) and then submit
in TEI. But I would like to be able to deal with the XML (it looks
and behaves like XML) directly.

It is frustrating that I am unable (in spite of googling all over the
net) to find a coherent explanation of how to set something up so that
I can load up an XML document and actually read it like a text. I
have downloaded a stylesheet that seems to have the correct tags, but
nothing I do with my browser will pull up a TEI book so that I can see
the tables and the poetry and the formatted block quotes as they
should be.

Like I say, there is tons of material about getting stuff *into*
XML/SGML/TEI. Is there any coherent explanation about where one
stores a style sheet (and how to know you have the right one) and how
one gets a piece of XML to read said style sheet to produce coherent
copy? Even the person who wrote "A Gentle Guide to XML" admitted that
the Guide had to be written in HTML because he had no idea how to
produce something readable in anything else.

Surely the goal of the Text Encoding Initiative and SGML isn't that
people are forced to read:

<anthology>
<poem><title>The SICK ROSE</title>
<stanza>
<line>O Rose thou art sick.</line>
<line>The invisible worm,</line>
<line>That flies in the night</line>
<line>In the howling storm:</line>
</stanza>
<stanza>
<line>Has found out thy bed</line>
<line>Of crimson joy:</line>
<line>And his dark secret love</line>
<line>Does thy life destroy.</line>
</stanza>
</poem>
<!-- more poems go here -->
</anthology>

Instead of the poem within.

How do I get access to the poem as it was intended to be seen without
having to learn Perl or other types of advanced programming?

Eve M. Behr

<admittedly a bit cranky and incoherent after several hours of trying
to crack the code>
Peter Flynn
2007-06-27 19:55:05 UTC
Permalink
Post by Eve M. Behr
I am a volunteer at the Distributed Proofreaders project (an offshoot
of the Gutenberg project). I have been noticing that a lot of their
stuff is being encoded using the TEI Lite standard.
Now my problem is not so much that I can't manage to produce code. I
have years of working with tagged languages like Scribe and LaTeX that
separate content and style. But each of those packages had a document
compiler integrated into the package. I got quite expert with
manipulating their stylesheets to produce acceptable looking
documents.
But this XML/SGML stuff defeats me. What I have been doing is using
Word macros to convert back and forth between HTML and TEI. In this
way I have produced over a hundred pages of baseball statistics. I
use the HTML so that I can proof the tables (make sure that they're
rectangular and the headings are spanning correctly) and then submit
in TEI. But I would like to be able to deal with the XML (it looks
and behaves like XML) directly.
OK...you need the following (you probably have 1/2/3 already)

1. a copy of the TEI Lite DTD and/or Schema that the documents reference
2. a knowledge of XML
3. an understanding of the TEI Lite markup
4. an XML editor, preferably one with an XSLT IDE
5. a knowledge of XSLT
6. an XSLT processor (eg Saxon)
7. time

XSLT is an XML processing language expressed in XML itself, for
transforming XML into other formats, including HTML, other XML formats,
and plaintext (eg LaTeX, CSV, etc).

An XSLT processor applies your XSLT code to a specified XML file and
produces the output. XSLT2 can also take non-XML files as input and lets
you write code to generate XML (or other formats).

This is significantly more robust and reusable than any other method,
IMHO, including home-hacked Perl scripts, macros, and other converters
like Omnimark and Metamorphosis.

[XSL -- without the T -- does a similar job but produces PDF direct.
Personally I prefer to use XSLT to generate LaTeX, but I'm notorious.]

By way of demo, here's some XML:

<?xml version="1.0"?>
<!DOCTYPE TEI.2 SYSTEM "/dtds/teixlite.dtd">
<TEI.2>
<teiHeader>
<fileDesc>
<titleStmt>
<title>Demo</title>
</titleStmt>
<publicationStmt>
<distributor>Silmaril Consultants</distributor>
<availability>
<p>Unrestricted</p>
</availability>
</publicationStmt>
<sourceDesc>
<p>Original code</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div0>
<head>Demo of TEI Lite XML</head>
<p>Hello, world!</p>
</div0>
</body>
</text>
</TEI.2>

and here's some XSLT:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="html"/>

<xsl:template match="/">
<html>
<head>
<title>
<xsl:value-of select="TEI.2/teiHeader/fileDesc/titleStmt/title"/>
</title>
</head>
<body>
<xsl:apply-templates select="TEI.2/text"/>
</body>
</html>
</xsl:template>

<xsl:template match="div0/head">
<h1>
<xsl:apply-templates/>
</h1>
</xsl:template>

<xsl:template match="p">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>

</xsl:stylesheet>

and if you run Saxon:

$ java -jar /usr/local/saxon/b8.5/saxon8.jar -o test.html test.xml test.xsl

you get the following output

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Demo</title>
</head>
<body>
<h1>Demo of TEI Lite XML</h1>
<p>Hello, world!</p>
</body>
</html>

and here's another XSLT that produces LaTeX source:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="text"/>

<xsl:template match="/">
<xsl:text>\documentclass{article}\begin{document}</xsl:text>
<xsl:apply-templates
select="TEI.2/teiHeader/fileDesc/titleStmt/title"/>
<xsl:apply-templates select="TEI.2/text"/>
<xsl:text>\end{document}</xsl:text>
</xsl:template>

<xsl:template match="title">
<xsl:text>\title{</xsl:text>
<xsl:apply-templates/>
<xsl:text>}\author{Unidentified}\maketitle </xsl:text>
</xsl:template>

<xsl:template match="div0/head">
<xsl:text>\section{</xsl:text>
<xsl:apply-templates/>
<xsl:text>}</xsl:text>
</xsl:template>

<xsl:template match="p">
<xsl:apply-templates/>
<xsl:text>&#xa;&#xa;</xsl:text>
</xsl:template>

</xsl:stylesheet>
Post by Eve M. Behr
It is frustrating that I am unable (in spite of googling all over the
Google is probably the wrong tool for doing that.
Post by Eve M. Behr
net) to find a coherent explanation of how to set something up so that
I can load up an XML document and actually read it like a text.
That's a different matter. You have the following options:

a. use a browser that reads XML (Firefox, MSIE, DocZilla, etc)
b. open the file in an editor
c. have the file served to you by an XML server (eg AxKit, Cocoon,
PropelX, etc)

But as you have already deduced, you need a stylesheet to express how
you want it to look, because XML typically does not carry styling
information, only content-descriptive markup.

This can be done with CSS, provided you don't want to change the order
of the document, or do anything cute like generating a table of
contents. For that you have to use XSLT, because it processes the file
and can therefore reach into it and cherry-pick the bits you want where
you want them, which CSS can't do. You can of course also apply CSS post
hoc to the generated HTML.
Post by Eve M. Behr
I have downloaded a stylesheet that seems to have the correct tags, but
nothing I do with my browser will pull up a TEI book so that I can see
the tables and the poetry and the formatted block quotes as they
should be.
Is it a CSS stylesheet or an XSLT stylesheet?

Add ONE of the following to the XML document if it's not already there,
after the <?xml version="1.0"?> but before the <TEI.2> start-tag:

<?xml-stylesheet href="foo.xsl" type="text/xsl"?>
<?xml-stylesheet href="foo.css" type="text/css"?>
Post by Eve M. Behr
Like I say, there is tons of material about getting stuff *into*
XML/SGML/TEI. Is there any coherent explanation about where one
stores a style sheet
Doesn't matter so long as your href above points at it. The argument
MUST be a URI though (eg file:///C:/foo/bar.xsl not C:\foo\bar.xsl)
Post by Eve M. Behr
(and how to know you have the right one)
"Right one" is determined by whoever sent you the documents, or by your
own decision as to what you want to do with them. There is no one
goddess-given format that fits all documents (but TEI is principally
used for transcriptions of literary and historical documents, so there
would be expected to be a stylesheet that would recreate the original
layout insofar as a browser is capable of it. And because XSLT can
output plaintext like LaTeX code, you could have one which reproduced a
typographic fac-simile.
Post by Eve M. Behr
and how
one gets a piece of XML to read said style sheet to produce coherent
copy?
A browser will honor the <?xml-stylesheet...?> PI and apply the styling
to the document as much as browsers are capable of doing so (they are
notoriously flaky handling XML, which is why all major TEI projects use
XML servers which do the transformation server-side).
Post by Eve M. Behr
Even the person who wrote "A Gentle Guide to XML" admitted that
the Guide had to be written in HTML because he had no idea how to
produce something readable in anything else.
Is this an Urban Myth? I can't see Michael or Lou authoring in HTML.
Post by Eve M. Behr
Surely the goal of the Text Encoding Initiative and SGML isn't that
Certainly not. The XML/SGML is the master storage format. You can
download it and write your own style for display, or use whatever a
particular project provides online (see for example http://celt.ucc.ie)
Post by Eve M. Behr
Instead of the poem within.
For an example of TEI encoding a Shakespearian sonnet in the original
Klingon, with markup in Elvish, see http://research.silmaril.ie/xml/
Post by Eve M. Behr
How do I get access to the poem as it was intended to be seen
Do we know how "it was intended to be seen"? By whom? The author (now
dead)? The publisher (which one)? The TEI project (I doubt they have a
view of how it "ought" to be seen).
Post by Eve M. Behr
without
having to learn Perl or other types of advanced programming?
XSLT and/or CSS. Unless someone has already produced a stylesheet,
you'll have to write one, preferably one that formats all that class of
poetry encoded in TEI.

Generally speaking, it's no business of the encoder to decide how the
reader "ought" to view a TEI transcription, because it's being produced
for many types of use, including data mining, linguistic analysis,
lexicography, grammatical analysis, and a bunch of other ographies where
visual appearance has no relevance whatever.

On the other hand it's courteous to produce a rudimentary view of the
files for the convenience of the user. The TEI manual doesn't do so
because it deals with explaining how to construct a transcription, not
how to view it.

Oh...if the files are SGML not XML then all bets are off. XSL[T] only
works with XML. You can either convert the SGML to XML (see the FAQ for
links) or use a transformer that handles SGML (eg Omnimark or Jade).
Post by Eve M. Behr
<admittedly a bit cranky and incoherent after several hours of trying
to crack the code>
Perfectly understandable, Have a read of the XML FAQ
(http://xml.silmaril.ie) and the XSL FAQ (http://www.dpawson.co.uk/xsl/)
and come back to us with questions.

///Peter
Eve M. Behr
2007-06-28 02:58:58 UTC
Permalink
This is going to be a bit frustrating for you, but unfortunately I
don't have the programming background to understand quite what you're
attempting to describe:

On Wed, 27 Jun 2007 20:55:05 +0100, Peter Flynn
Post by Peter Flynn
Post by Eve M. Behr
But this XML/SGML stuff defeats me. What I have been doing is using
Word macros to convert back and forth between HTML and TEI. In this
way I have produced over a hundred pages of baseball statistics. I
use the HTML so that I can proof the tables (make sure that they're
rectangular and the headings are spanning correctly) and then submit
in TEI. But I would like to be able to deal with the XML (it looks
and behaves like XML) directly.
OK...you need the following (you probably have 1/2/3 already)
1. a copy of the TEI Lite DTD and/or Schema that the documents reference
I might have the DTD but I have no idea what to do with it. I pasted
the text into a notepad file and named it Teilite.dtd. No idea how it
fits into the picture. I have no idea what a Schema is.
Post by Peter Flynn
2. a knowledge of XML
I'm able to create tables that apparently pass validation (so the
project leader says--though he is the one who is telling the team to
trust him and forget about being able to see our work. He says it's
impossible to render XML human-readable). When I open it up with XML
Marker it shows nicely behaved collapsible trees with no error
indications. When the tags are converted to HTML tags the tables show
as nicely formatted tables.
Post by Peter Flynn
3. an understanding of the TEI Lite markup
TEI Lite looks like the sort of style sheets I'm familiar with from
working with Brian Reed's Scribe and with working with LaTeX. I like
it.
Post by Peter Flynn
4. an XML editor, preferably one with an XSLT IDE
I'm not sure I'd know what an XSLT IDE is. I did just download Emacs
as per http://www.tei-c.org/Software/tei-emacs/ but so far it just
looks like a notepad type editor (like a programmer IDE interface that
hilights variables and stuff -- I've done VBA coding). Aside from
that I have XML Marker and XML Notepad.
Post by Peter Flynn
5. a knowledge of XSLT
6. an XSLT processor (eg Saxon)
I tried reading the Saxon webpage and unfortunately none of it makes
any sense at all. I understand it's a command line language but
that's about all I could fathom from the page.
Post by Peter Flynn
7. time
I have lots of time, but lots of uses for it also (like preparing and
reading stuff that is human readable).
Post by Peter Flynn
XSLT is an XML processing language expressed in XML itself, for
transforming XML into other formats, including HTML, other XML formats,
and plaintext (eg LaTeX, CSV, etc).
So basically, you have to convert the XML into something else? Why
not simply write in that something else, like LaTeX which already
imposes structure and can swap stylesheets. And has a compiler to
produce human readable text. Alternatively, how is this all that much
different from my writing replace macros in Word to convert from TEI
(e.g., <cell></cell> to HTML <td></td>).
Post by Peter Flynn
An XSLT processor applies your XSLT code to a specified XML file and
produces the output. XSLT2 can also take non-XML files as input and lets
you write code to generate XML (or other formats).
This is significantly more robust and reusable than any other method,
IMHO, including home-hacked Perl scripts, macros, and other converters
like Omnimark and Metamorphosis.
[XSL -- without the T -- does a similar job but produces PDF direct.
Personally I prefer to use XSLT to generate LaTeX, but I'm notorious.]
I love LaTeX, and I'm beginning to think I might want to invest my
time using it in its native form. I have MikTeX and feel really
spoiled because when I did LaTeX officially there were no screen
previews (you had to kill trees before you could see what your stuff
looked like). But some of the projects I want to work on for
Gutenberg are insisting on this TEI standard so ...
Post by Peter Flynn
<?xml version="1.0"?>
<!DOCTYPE TEI.2 SYSTEM "/dtds/teixlite.dtd">
<TEI.2>
<teiHeader>
<fileDesc>
<titleStmt>
<title>Demo</title>
</titleStmt>
<publicationStmt>
<distributor>Silmaril Consultants</distributor>
<availability>
<p>Unrestricted</p>
</availability>
</publicationStmt>
<sourceDesc>
<p>Original code</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div0>
<head>Demo of TEI Lite XML</head>
<p>Hello, world!</p>
</div0>
</body>
</text>
</TEI.2>
This I get. This is like the text I actually type in LaTeX. Looks
different but essentially the same sort of thing.
Post by Peter Flynn
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>
<xsl:value-of select="TEI.2/teiHeader/fileDesc/titleStmt/title"/>
</title>
</head>
<body>
<xsl:apply-templates select="TEI.2/text"/>
</body>
</html>
</xsl:template>
<xsl:template match="div0/head">
<h1>
<xsl:apply-templates/>
</h1>
</xsl:template>
<xsl:template match="p">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
</xsl:stylesheet>
This I recognize also, though I only dipped into the Scribe/LaTeX
databases because I was unhappy with what the designers thought
consistuted a proper manuscript. Once I was happy, I spent most of my
time cranking out manuscript.
Post by Peter Flynn
$ java -jar /usr/local/saxon/b8.5/saxon8.jar -o test.html test.xml test.xsl
Here you lose me completely. Looks like a conversion utility though.
Post by Peter Flynn
you get the following output
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Demo</title>
</head>
<body>
<h1>Demo of TEI Lite XML</h1>
<p>Hello, world!</p>
</body>
</html>
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<snipped innards>
Post by Peter Flynn
</xsl:stylesheet>
Here's what I need explained. Obviously these are two files (the
style sheet and the document). Where do I put them and what are their
extensions? And how do I open up the document in a way that it sees
the stylesheet (Explorer won't do it and neither will Notepad).
Post by Peter Flynn
a. use a browser that reads XML (Firefox, MSIE, DocZilla, etc)
b. open the file in an editor
c. have the file served to you by an XML server (eg AxKit, Cocoon,
PropelX, etc)
But as you have already deduced, you need a stylesheet to express how
you want it to look, because XML typically does not carry styling
information, only content-descriptive markup.
OK, again, assume I have the stylesheet, how do I point the document
at it in a way that the browser will understand it. The browser when
it reads the XML file either gives me the collapsible tree or the text
all muddled together. Nothing that looks like anything.
Post by Peter Flynn
This can be done with CSS, provided you don't want to change the order
of the document, or do anything cute like generating a table of
contents. For that you have to use XSLT, because it processes the file
and can therefore reach into it and cherry-pick the bits you want where
you want them, which CSS can't do. You can of course also apply CSS post
hoc to the generated HTML.
Right now I'd be happy to see a section of text with something bolded
and perhaps an inset quote.
Post by Peter Flynn
Post by Eve M. Behr
I have downloaded a stylesheet that seems to have the correct tags, but
nothing I do with my browser will pull up a TEI book so that I can see
the tables and the poetry and the formatted block quotes as they
should be.
Is it a CSS stylesheet or an XSLT stylesheet?
It's a CSS stylesheet. I haven't encountered XSLT stylesheets. But
the CSS stylesheet seemed to have all the tags I had in my document.
Post by Peter Flynn
Add ONE of the following to the XML document if it's not already there,
<?xml-stylesheet href="foo.xsl" type="text/xsl"?>
<?xml-stylesheet href="foo.css" type="text/css"?>
Tried that. Again, both Notepad, and Explorer just mashed the text
together or showed raw code.
Post by Peter Flynn
Post by Eve M. Behr
Like I say, there is tons of material about getting stuff *into*
XML/SGML/TEI. Is there any coherent explanation about where one
stores a style sheet
Doesn't matter so long as your href above points at it. The argument
MUST be a URI though (eg file:///C:/foo/bar.xsl not C:\foo\bar.xsl)
Post by Eve M. Behr
(and how to know you have the right one)
"Right one" is determined by whoever sent you the documents, or by your
own decision as to what you want to do with them. There is no one
goddess-given format that fits all documents (but TEI is principally
used for transcriptions of literary and historical documents, so there
would be expected to be a stylesheet that would recreate the original
layout insofar as a browser is capable of it. And because XSLT can
output plaintext like LaTeX code, you could have one which reproduced a
typographic fac-simile.
OK, skip "right" one. Howabout any stylesheet that will produce
something recognizably formatted. Like with LaTeX, when I started I
was just happy if I could produce stuff with recognizable headings and
equations that looked like equations with the proper structure. It
was only later that I worked out stylesheets that didn't reflect
Lamport's silly prejudices and instead reflected *my* silly
prejudices.
Post by Peter Flynn
A browser will honor the <?xml-stylesheet...?> PI and apply the styling
to the document as much as browsers are capable of doing so (they are
notoriously flaky handling XML, which is why all major TEI projects use
XML servers which do the transformation server-side).
Post by Eve M. Behr
Even the person who wrote "A Gentle Guide to XML" admitted that
the Guide had to be written in HTML because he had no idea how to
produce something readable in anything else.
Is this an Urban Myth? I can't see Michael or Lou authoring in HTML.
Could have been a different "Gentle Guide". Just something I stumbled
along in my "wee hours I'm not going to bed till I figure something
out" sessions.
Post by Peter Flynn
Post by Eve M. Behr
Surely the goal of the Text Encoding Initiative and SGML isn't that
Certainly not. The XML/SGML is the master storage format. You can
download it and write your own style for display, or use whatever a
particular project provides online (see for example http://celt.ucc.ie)
OK, I tried this site and downloaded their Doczilla in response to
their statement "Your designated program for SGML files will open the
file." combined with the fact that Doczilla seemed to be what they
were talking about. Doczilla did open up the SGML file I downloaded
for testing. But, again, it only showed me lots and lots of code. I
assume that the site wants you to use their converted HTML files and
leave the SGML files for computer experts.
Post by Peter Flynn
Post by Eve M. Behr
Instead of the poem within.
For an example of TEI encoding a Shakespearian sonnet in the original
Klingon, with markup in Elvish, see http://research.silmaril.ie/xml/
Post by Eve M. Behr
How do I get access to the poem as it was intended to be seen
Do we know how "it was intended to be seen"? By whom? The author (now
dead)? The publisher (which one)? The TEI project (I doubt they have a
view of how it "ought" to be seen).
By someone who isn't a computer programmer? With text being bolded
instead of tags indicated it should be bolded and just "use your
imagination". I hate to start sounding snippy again, but I am
beginning to hate the sight of XML tags.
Post by Peter Flynn
Post by Eve M. Behr
without
having to learn Perl or other types of advanced programming?
XSLT and/or CSS. Unless someone has already produced a stylesheet,
you'll have to write one, preferably one that formats all that class of
poetry encoded in TEI.
I'm sure people have produced stylesheets, but I clearly am extracting
the information wrong. (I can't find style sheet files, just pages of
code and I try to guess where the file starts and ends).
Post by Peter Flynn
Generally speaking, it's no business of the encoder to decide how the
reader "ought" to view a TEI transcription, because it's being produced
for many types of use, including data mining, linguistic analysis,
lexicography, grammatical analysis, and a bunch of other ographies where
visual appearance has no relevance whatever.
At this point I'm not interested in data-mining, though I do
understand why scholars might want to do that.
Post by Peter Flynn
On the other hand it's courteous to produce a rudimentary view of the
files for the convenience of the user. The TEI manual doesn't do so
because it deals with explaining how to construct a transcription, not
how to view it.
Oh...if the files are SGML not XML then all bets are off. XSL[T] only
works with XML. You can either convert the SGML to XML (see the FAQ for
links) or use a transformer that handles SGML (eg Omnimark or Jade).
Post by Eve M. Behr
<admittedly a bit cranky and incoherent after several hours of trying
to crack the code>
Perfectly understandable, Have a read of the XML FAQ
(http://xml.silmaril.ie) and the XSL FAQ (http://www.dpawson.co.uk/xsl/)
and come back to us with questions.
///Peter
Well Peter. Through the course of typing this response (the last
three hours) I have tried various things. I even thought I finally
had the solution. Unfortunately, all I can get is "This XML file does
not appear to have any style information associated with it. The
document tree is shown below." And then the hated tree structure below
it. The XML FAQ seemed to be the most helpful, however all their
examples are to documents on their website and not on a hard drive.
They also don't show their style sheet. I'll have to look at other
pages and hope I land on one that uses an intro line that corresponds
to my setup.

I think I'll take a break and do something easy (like transcribe some
maths into LaTeX). But I will figure it out somehow. And when I do I
will then be faced with the task of explaining it to people who would
like to help with the project, but have even less computer background
than I do.

Once I figure some more out I might be able to ask decent questions.

Thanks very much again,

Eve M. Behr
Wilfried Hennings
2007-06-28 08:18:19 UTC
Permalink
Hello,

what about using a commercial XML authoring system?
e.g. XMetaL

Long time ago, I ordered HoTMetaL for html authoring. What I like best
are the different views: source (of course), tags on, normal, and page
preview.
Another feature I like: On editing in normal view or tags on view,
inserting a tag automatically inserts the corresponding end tag so that
always a valid html syntax is guaranteed.
Unfortunately (for me) the company (Softquad, now named Just Systems)
stopped HoTMetaL development and concentrated on the XML editor XMetaL
which has similar features.
You can download a trial version.
See http://na.justsystems.com/

Hope this helps.

--
Wilfried Hennings
email me: change "nospam" to "w.hennings"
All opinions mentioned are strictly my own, not my employer's.
Eve M. Behr
2007-06-28 14:01:57 UTC
Permalink
On Thu, 28 Jun 2007 10:18:19 +0200, Wilfried Hennings
Post by Peter Flynn
Hello,
what about using a commercial XML authoring system?
e.g. XMetaL
Long time ago, I ordered HoTMetaL for html authoring. What I like best
are the different views: source (of course), tags on, normal, and page
preview.
Another feature I like: On editing in normal view or tags on view,
inserting a tag automatically inserts the corresponding end tag so that
always a valid html syntax is guaranteed.
Unfortunately (for me) the company (Softquad, now named Just Systems)
stopped HoTMetaL development and concentrated on the XML editor XMetaL
which has similar features.
You can download a trial version.
See http://na.justsystems.com/
Hope this helps.
XMetal costs about $700 U.S. You expect me to tell the people who
want to join me in *volunteer* coding books for Gutenberg to spend
that to help out with a few pages? You expect me to pay that to see
the books I am downloading from the internet? I am not looking to be
an XML programmer. I just want to see what it is I am really creating
when I help create a book. I am *not* interested in working on stuff
that requires a senior programming certificate or a bank full of money
to see.

So what you're telling me is that all these people who are setting up
books in XML are in fact locking them away from the people without the
expertise or the cash to read them?

Somehow I didn't think the goals for the TEI Consortium were that
elitist.

Eve M. Behr
Wilfried Hennings
2007-06-28 14:15:21 UTC
Permalink
Post by Eve M. Behr
XMetal costs about $700 U.S. You expect me to tell the people who
want to join me in *volunteer* coding books for Gutenberg to spend
that to help out with a few pages? You expect me to pay that to see
the books I am downloading from the internet? I am not looking to be
an XML programmer. I just want to see what it is I am really creating
when I help create a book. I am *not* interested in working on stuff
that requires a senior programming certificate or a bank full of money
to see.
So what you're telling me is that all these people who are setting up
books in XML are in fact locking them away from the people without the
expertise or the cash to read them?
Somehow I didn't think the goals for the TEI Consortium were that
elitist.
Sorry, I overlooked your first post and thought you needed it for a paid
work.


--
Wilfried Hennings
email me: change "nospam" to "w.hennings"
All opinions mentioned are strictly my own, not my employer's.
Shmuel (Seymour J.) Metz
2007-07-15 23:54:59 UTC
Permalink
Post by Eve M. Behr
So what you're telling me is that all these people who are setting up
books in XML are in fact locking them away from the people without
the expertise or the cash to read them?
One thing that I have learned over the years is that when someone
writes "what you're telling me:, it's dollars to donuts that they will
follow it with something totally imaginary. As in this case.
Post by Eve M. Behr
Somehow I didn't think the goals for the TEI Consortium were that
elitist.
Whether or not their goals are elitist, they're not doing what yu
accuse them of. If they are mandating SGML or XML based markup, it's
because they understand the issues better than you do. Whether your
manager could have done a better job of communicating with you is a
separate issue.

BTW, I've seen similar comments from people who didn't want to use
markup languages. Consider what you would tell them, then apply it to
your case.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to ***@library.lspace.org
Peter Flynn
2007-07-16 19:31:55 UTC
Permalink
Eve M. Behr wrote:
[...]
Post by Eve M. Behr
So what you're telling me is that all these people who are setting up
books in XML are in fact locking them away from the people without the
expertise or the cash to read them?
Exactly the opposite. As far as the document is concerned, it's
irrelevant what software you use, a $5000 whoop-de-doo commercial
editor, or Emacs for free. They'll both create an XML document, and the
document format (XML) is open and requires no expenditure to read.

///Peter

Peter Flynn
2007-06-29 22:54:45 UTC
Permalink
Post by Eve M. Behr
This is going to be a bit frustrating for you, but unfortunately I
don't have the programming background to understand quite what you're
[...]
Post by Eve M. Behr
Post by Peter Flynn
1. a copy of the TEI Lite DTD and/or Schema that the documents reference
I might have the DTD but I have no idea what to do with it.
You shouldn't need to do anything with it except to ensure that it is in
the location specified at the top of the XML files. If your XML files say

<!DOCTYPE TEI.2 SYSTEM "teilitex.dtd">

then software will expect the DTD (that file) to be in the same
directory as the XML file. If the XML file says

<!DOCTYPE TEI.2 SYSTEM "file:///C:/mydtds/teilitex.dtd">

then the DTD must be in C:\mydtds
Post by Eve M. Behr
I pasted
the text into a notepad file and named it Teilite.dtd.
OK so long as that's what the XML files refer to it as (including the
capital T...URIs are case-sensitive).
Post by Eve M. Behr
No idea how it
fits into the picture.
Software reads that first line, looks to find the DTD where it says, and
then reads it. That's all you need.
Post by Eve M. Behr
I have no idea what a Schema is.
Forget it for the moment. It's an alternative way of expressing the same
information.
Post by Eve M. Behr
Post by Peter Flynn
2. a knowledge of XML
I'm able to create tables that apparently pass validation (so the
project leader says--
Excellent.
Post by Eve M. Behr
though he is the one who is telling the team to trust him and forget
about being able to see our work.
Silly man. Technically he's right: if the documents you create are
correctly constructed (valid) *and* you've put the right data in the
right places, then all will indeed be well when it gets formatted.

But most people like to see how it is progressing. One (crude) way is to
write a CSS stylesheet to explain to your browser what gets formatted
and how. This requires a knowledge of CSS. To get a more complex format
you would need to write an XSLT stylesheet...more learning, more work.
Post by Eve M. Behr
He says it's impossible to render XML human-readable.
There is a technical term for this: bullshit. I render XML
human-readable all day, every day. Look at the XML FAQ for an example.

OK, a vast table of impenetrably complex data intended for the internal
management of, say, a numerically-controlled turret lathe, is not going
to be meaningful if you rendered it human-readable. But normal text
documents are easily done.
Post by Eve M. Behr
When I open it up with XML Marker it shows
nicely behaved collapsible trees with no error indications.
So far so good.
Post by Eve M. Behr
When the tags are converted to HTML tags the tables show as nicely
formatted tables.
So far so better.
Post by Eve M. Behr
Post by Peter Flynn
3. an understanding of the TEI Lite markup
TEI Lite looks like the sort of style sheets I'm familiar with from
working with Brian Reed's Scribe and with working with LaTeX. I like
it.
Yep, it's fine. It has its limitations: it's intended for the
transcription of literary and historical documents, so it's not a good
choice for the documentation of a nuclear power station. But a large
number of other common DTDs work in a very similar manner, just with
different names for the element types.
Post by Eve M. Behr
Post by Peter Flynn
4. an XML editor, preferably one with an XSLT IDE
I'm not sure I'd know what an XSLT IDE is.
An Interactive Development Environment for the Extensible Stylesheet
Language (Transformations). Basically a glorified editor that lets you
write stylesheets and see them applied to a document, all from the one
screen. Just a convenience.
Post by Eve M. Behr
I did just download Emacs
as per http://www.tei-c.org/Software/tei-emacs/ but so far it just
looks like a notepad type editor (like a programmer IDE interface that
hilights variables and stuff -- I've done VBA coding). Aside from
that I have XML Marker and XML Notepad.
But if they've included the right add-ons in that distribution (which I
*think* they did, knowing the people who did it), it ought to understand
the markup. If you open your XML file, does it colorise the markup and
switch to XML-mode (check the status line at the bottom of the window)?
Does it have a "Markup" menu? It should. Does it complain that it can't
find TEI.2 (that means your DTD is not where the XML file says it ought
to be). If it doesn't automatically spot the fact that it's XML, then
either the distribution package doesn't have the right bells and
whistles, or it has not installed itself properly. Last time I used it,
however, it all seemed to work fine. I use Emacs all the time for all my
XML editing.
Post by Eve M. Behr
Post by Peter Flynn
5. a knowledge of XSLT
6. an XSLT processor (eg Saxon)
I tried reading the Saxon webpage and unfortunately none of it makes
any sense at all.
You don't need to understand any of it, just install the software. It's
written in Java, so you'll need an up-to-date Java (download it from
java.sun.com).
Post by Eve M. Behr
I understand it's a command line language but
that's about all I could fathom from the page.
Not a language. Saxon is just a processor. It takes an XML file and an
XSLT stylesheet and spits out whatever format the stylesheet says to.
Post by Eve M. Behr
Post by Peter Flynn
7. time
I have lots of time, but lots of uses for it also (like preparing and
reading stuff that is human readable).
I have a shortage of time, and like to spend what I have reading and
drinking wine :-)
Post by Eve M. Behr
Post by Peter Flynn
XSLT is an XML processing language expressed in XML itself, for
transforming XML into other formats, including HTML, other XML formats,
and plaintext (eg LaTeX, CSV, etc).
So basically, you have to convert the XML into something else?
Not necessarily, but usually.
Post by Eve M. Behr
Why
not simply write in that something else, like LaTeX which already
imposes structure and can swap stylesheets.
Because XML is much more robust than LaTeX. XML is a *provable* format:
any validator can test an XML file against the DTD and say if it's right
or not. The only program that will really check a LaTeX file is TeX
itself, and all it can do is typeset it (which it does really well).

LaTeX's structuring is non-rigorous (limp and floppy). In LaTeX, I can write

\section{stuff}
blah blah
\subsubsection{more stuff}
blah blah

and get

1 stuff

blah blah

1.0.1 more stuff

blah blah

which is obviously wrong, but LaTeX won't stop you. In any properly
constructed DTD you simply can't do that. And with XML you do all kinds
of other stuff, like retrieve the third para of the fourteenth
subsection of section 3 of chapter 4, which is virtually impossible in
LaTeX.
Post by Eve M. Behr
And has a compiler to
produce human readable text. Alternatively, how is this all that much
different from my writing replace macros in Word to convert from TEI
(e.g., <cell></cell> to HTML <td></td>).
Because -- for example -- perhaps some kinds of table aren't really
meant to be rendered as tables? See the last paragraph of
http://xml.silmaril.ie/appendix/glossary/#tables

<table>
<tr>
<th>Chocolate<th>
<td>A major food group</td>
</tr>
<tr>
<th>XML</th>
<td>A non-edible language</td>
</tr>
</table>

should probably best be rendered as

\begin{description}
\item[Chocolate] A major food group
\item[XML] A non-edible language
\end{description}

Don't fall into the trap of assuming that things only ever have one
possible rendering. You also might not want to print your document, but
speak it through an audio generator. Or format it for some other
purpose. LaTeX can be considered an extreme case of premature binding.

There's also the problem that different markup elements mean different
things in different places. <head> inside <table> is actually a caption,
whereas <head> inside a <div3> is a subsubsection title. Trying to do
that with Word macros leads to insanity.
Post by Eve M. Behr
I love LaTeX, and I'm beginning to think I might want to invest my
time using it in its native form.
I use it all the time...for formatting my XML. But my master copies are
always tucked away safely in XML, not LaTeX.
Post by Eve M. Behr
I have MikTeX and feel really
spoiled because when I did LaTeX officially there were no screen
previews (you had to kill trees before you could see what your stuff
looked like).
That must have been nearly as long ago as when I started using it. My
first previewer was a Tektronix graphics storage tube. Even the very
first DOS versions of TeX had DVI viewers for the CGA and Hercules
graphics cards.
Post by Eve M. Behr
But some of the projects I want to work on for
Gutenberg are insisting on this TEI standard so ...
Preservability and reusability. Trust me on this.

[xml example]
Post by Eve M. Behr
This I get. This is like the text I actually type in LaTeX. Looks
different but essentially the same sort of thing.
OK.

[xslt example]
Post by Eve M. Behr
This I recognize also, though I only dipped into the Scribe/LaTeX
databases because I was unhappy with what the designers thought
consistuted a proper manuscript. Once I was happy, I spent most of my
time cranking out manuscript.
OK too.
Post by Eve M. Behr
Post by Peter Flynn
$ java -jar /usr/local/saxon/b8.5/saxon8.jar -o test.html test.xml test.xsl
Here you lose me completely. Looks like a conversion utility though.
That's exactly what it is. With an IDE (remember IDEs) you don't have to
type this stuff, just click on the button.

[xslt latex example]
Post by Eve M. Behr
Here's what I need explained. Obviously these are two files (the
style sheet and the document). Where do I put them and what are their
extensions?
yourfile.xml is the XML file

whatever.xsl is an XSLT stylesheet (maybe called tei2html.xsl or
tei2latex.xsl)
Post by Eve M. Behr
And how do I open up the document in a way that it sees
the stylesheet (Explorer won't do it and neither will Notepad).
<?xml-stylesheet href="tei2latex.xsl" type="text/xsl"?>

inserted in your XML document (at the top, between the DOCTYPE line and
the <TEI.2> start-tag.

Explorer, Firefox, and most other modern browsers will work with this,
but their implementations are incomplete and clumsy. Only a few XML
editors can render using XSLT in real time (because XML processing
normally needs the whole document to be complete -- for example to
resolve ID/IDREF links [similar to \label and \ref] -- and in an
*editor*, designed for writing a document, a file might simply not yet
be complete...and thus partially unrenderable).
Post by Eve M. Behr
Post by Peter Flynn
a. use a browser that reads XML (Firefox, MSIE, DocZilla, etc)
b. open the file in an editor
c. have the file served to you by an XML server (eg AxKit, Cocoon,
PropelX, etc)
But as you have already deduced, you need a stylesheet to express how
you want it to look, because XML typically does not carry styling
information, only content-descriptive markup.
OK, again, assume I have the stylesheet, how do I point the document
at it in a way that the browser will understand it.
Use the <?xml-stylesheet...?> Processing Instruction as above, or use an
external processor like Saxon, as above or in an IDE. Or write some
chunk of code in XML.NET which opens the document, binds the stylesheet,
and runs whatever it is that .NET runs to do XSLT (forgive me, I'm not a
Windows user).
Post by Eve M. Behr
The browser when
it reads the XML file either gives me the collapsible tree or the text
all muddled together. Nothing that looks like anything.
<?xml-stylesheet...?> is your friend...but be warned that browser XSLT
is flaky. For an example, see http://xml.silmaril.ie/hotels.xml
Post by Eve M. Behr
Post by Peter Flynn
This can be done with CSS, provided you don't want to change the order
of the document, or do anything cute like generating a table of
contents. For that you have to use XSLT, because it processes the file
and can therefore reach into it and cherry-pick the bits you want where
you want them, which CSS can't do. You can of course also apply CSS post
hoc to the generated HTML.
Right now I'd be happy to see a section of text with something bolded
and perhaps an inset quote.
Try that hotels.xml link above.
Post by Eve M. Behr
Post by Peter Flynn
Post by Eve M. Behr
I have downloaded a stylesheet that seems to have the correct tags, but
nothing I do with my browser will pull up a TEI book so that I can see
the tables and the poetry and the formatted block quotes as they
should be.
Is it a CSS stylesheet or an XSLT stylesheet?
It's a CSS stylesheet. I haven't encountered XSLT stylesheets. But
the CSS stylesheet seemed to have all the tags I had in my document.
Post by Peter Flynn
Add ONE of the following to the XML document if it's not already there,
<?xml-stylesheet href="foo.xsl" type="text/xsl"?>
<?xml-stylesheet href="foo.css" type="text/css"?>
Tried that. Again, both Notepad, and Explorer just mashed the text
together or showed raw code.
Forget Notepad utterly. It's completely dumb and won't do anything
except write shopping lists. Explorer should do it, but it may need the
documents to be served from a server, not opened locally. Install
Firefox and try that instead.
Post by Eve M. Behr
OK, skip "right" one. Howabout any stylesheet that will produce
something recognizably formatted.
If you have a CSS stylesheet that someone claims will format your
document, ask them how to do it. If you want to try it yourself, create
myfile.xml:

<?xml version="1.0"?>
<?xml-stylesheet href="test.css" type="text/css"?>
<doc>
<title>Hello world!</title>
<text>A test</text>
</doc>

and test.css

title { display:block; font-weight:bold; font-size:24pt }
text { display:block; margin-top:12pt; font-size:12pt; }

and open myfile.xml. But as I said, you may have to do this through a
web server if your browser doesn't handle XML when opened from disk.
Post by Eve M. Behr
Like with LaTeX, when I started I
was just happy if I could produce stuff with recognizable headings and
equations that looked like equations with the proper structure. It
was only later that I worked out stylesheets that didn't reflect
Lamport's silly prejudices and instead reflected *my* silly
prejudices.
We all do this...
Post by Eve M. Behr
Post by Peter Flynn
A browser will honor the <?xml-stylesheet...?> PI and apply the styling
to the document as much as browsers are capable of doing so (they are
notoriously flaky handling XML, which is why all major TEI projects use
XML servers which do the transformation server-side).
Post by Eve M. Behr
Even the person who wrote "A Gentle Guide to XML" admitted that
the Guide had to be written in HTML because he had no idea how to
produce something readable in anything else.
Is this an Urban Myth? I can't see Michael or Lou authoring in HTML.
Could have been a different "Gentle Guide".
From www.tei-c.org it should be one of the chapters in the TEI
Guidelines. I'm not aware of any other one.

Just something I stumbled
Post by Eve M. Behr
along in my "wee hours I'm not going to bed till I figure something
out" sessions.
Post by Peter Flynn
Post by Eve M. Behr
Surely the goal of the Text Encoding Initiative and SGML isn't that
Certainly not. The XML/SGML is the master storage format. You can
download it and write your own style for display, or use whatever a
particular project provides online (see for example http://celt.ucc.ie)
OK, I tried this site and downloaded their Doczilla in response to
their statement "Your designated program for SGML files will open the
file." combined with the fact that Doczilla seemed to be what they
were talking about. Doczilla did open up the SGML file I downloaded
for testing. But, again, it only showed me lots and lots of code.
I must check that. It worked last time I tried it with the Panorama
plugin for Netscape 3 (which has the same fundamentals as DocZilla) but
it's possible that things have deteriorated: those web pages are in
transition.
Post by Eve M. Behr
I assume that the site wants you to use their converted HTML files and
leave the SGML files for computer experts.
No, for experts in Irish literature and history. But the SGML files are
in the process of being converted to XML, and a new server is being
installed which will do much more, using XSLT, than was possible in SGML
using Omnimark.
Post by Eve M. Behr
Post by Peter Flynn
Post by Eve M. Behr
Instead of the poem within.
For an example of TEI encoding a Shakespearian sonnet in the original
Klingon, with markup in Elvish, see http://research.silmaril.ie/xml/
Post by Eve M. Behr
How do I get access to the poem as it was intended to be seen
Do we know how "it was intended to be seen"? By whom? The author (now
dead)? The publisher (which one)? The TEI project (I doubt they have a
view of how it "ought" to be seen).
By someone who isn't a computer programmer? With text being bolded
instead of tags indicated it should be bolded
That's what stylesheets do. Exactly the same principle as in LaTeX.
Post by Eve M. Behr
and just "use your
imagination".
I don't think I've ever said that. I assume this is someone else.
Post by Eve M. Behr
I hate to start sounding snippy again, but I am
beginning to hate the sight of XML tags.
I think you have been neglected here. Let's try to repair it.
Post by Eve M. Behr
I'm sure people have produced stylesheets, but I clearly am extracting
the information wrong. (I can't find style sheet files, just pages of
code and I try to guess where the file starts and ends).
I'm not sure where you're looking, but try
http://www.tei-c.org/Stylesheets/teic/

But you will need to install Saxon or a similar processor to do this.
I've never heard of anyone using CSS to format TEI (like using a
teaspoon to move a mountain) but it's possible, just infinitely fiddly
and tedious.
Post by Eve M. Behr
Well Peter. Through the course of typing this response (the last
three hours) I have tried various things. I even thought I finally
had the solution. Unfortunately, all I can get is "This XML file does
not appear to have any style information associated with it. The
document tree is shown below."
<?xml-stylesheet...?>
Post by Eve M. Behr
And then the hated tree structure below
it. The XML FAQ seemed to be the most helpful, however all their
examples are to documents on their website and not on a hard drive.
Right. That's what XML was designed for, although it works perfectly
well off disk. But your *browser* may not. My Firefox seems to be OK.
Post by Eve M. Behr
They also don't show their style sheet. I'll have to look at other
pages and hope I land on one that uses an intro line that corresponds
to my setup.
Try those two test files above.
Post by Eve M. Behr
I think I'll take a break and do something easy (like transcribe some
maths into LaTeX). But I will figure it out somehow. And when I do I
will then be faced with the task of explaining it to people who would
like to help with the project, but have even less computer background
than I do.
More documentation is *always* needed. But XML requires absolute
precision: what you specify (names, directories, etc) must be 100%
correct or nothing at all will work.
Post by Eve M. Behr
Once I figure some more out I might be able to ask decent questions.
By all means...

///Peter, away on vacation for a few weeks now...byebye
Richard Tobin
2007-06-29 10:01:58 UTC
Permalink
Post by Peter Flynn
OK...you need the following (you probably have 1/2/3 already)
1. a copy of the TEI Lite DTD and/or Schema that the documents reference
2. a knowledge of XML
3. an understanding of the TEI Lite markup
4. an XML editor, preferably one with an XSLT IDE
5. a knowledge of XSLT
6. an XSLT processor (eg Saxon)
7. time
Surely the TEI project already has XSL stylesheets for converting to
common formats?

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Richard Tobin
2007-06-29 10:03:40 UTC
Permalink
Post by Richard Tobin
Surely the TEI project already has XSL stylesheets for converting to
common formats?
... and a Google search for "TEI stylesheets" returns lots of results.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Eve M. Behr
2007-06-29 14:38:05 UTC
Permalink
Post by Richard Tobin
Post by Richard Tobin
Surely the TEI project already has XSL stylesheets for converting to
common formats?
... and a Google search for "TEI stylesheets" returns lots of results.
-- Richard
Hi, I got the help I needed via private email. However the issue
wasn't that I didn't have stylesheets. I had almost too many. I
didn't know where they went on my drive nor how my xml document was
intended to access them from my drive. I was also surprised that the
stylesheet I was so generously given by email was in a slightly
different form than what I had been seeing as to headers and such. You
see, I never was able to find a stylesheet file, only stylesheet data
buried in a bunch of other narrative. If you're not familiar with the
syntax, it is difficult to pick out.

And no, the project I'm on does not provide a stylesheet. I'll now be
writing one myself. I will probably also not work on any other
project with the same manager because of his lack of reasonable
instructions.

This all isn't for my benefit. I can do these pages easily (albeit by
writing in HTML and then converting), it is so other people can join
in (I notice that nobody has since I stopped contributing five days
ago). I want to go on to other projects but I don't want the current
one to hang indefinitely for lack of reasonable instructions for
handling the text.

Anyway, thank you all for your help. My next step is setting up the
spreadsheet and deciding if it's the method I want to use to go to my
next phase.

Eve M. Behr
Loading...