n05W27+ (Markus Kuhn)
2005-07-04 14:10:20 UTC
I'd be interested in implementing a very simple SGML parser
for a fixed and small number of DTDs. Since the DTDs will be
sort-of hardwired, it would be convenient to use a very low-footprint
parser that knows nothing about the syntax of human-readable
SGML DTDs, but operates on some simpler abstraction of the same
grammar.
What I have in mind in particular is a deterministic pushdown automaton
that uses as its input alphabet the union of all start tags, end tags
and PCDATA strings and as its stack alphabet the set of element
names. Purpose of this pushdown automation is to detect the start and
end of each element in the input file, so this is really about SGML
with omitted tags (and not XML).
Before I reinvent the wheel (or even do the conversion manually),
is there already a piece of free software out there that parses
an SGML DTD and converts it into something resembling the
state transition table of a deterministic push-down automaton?
It does not even have to deal with attributes, as I'm only
interested here in validating the element structure and
detecting omitted tags.
Thanks,
Markus
for a fixed and small number of DTDs. Since the DTDs will be
sort-of hardwired, it would be convenient to use a very low-footprint
parser that knows nothing about the syntax of human-readable
SGML DTDs, but operates on some simpler abstraction of the same
grammar.
What I have in mind in particular is a deterministic pushdown automaton
that uses as its input alphabet the union of all start tags, end tags
and PCDATA strings and as its stack alphabet the set of element
names. Purpose of this pushdown automation is to detect the start and
end of each element in the input file, so this is really about SGML
with omitted tags (and not XML).
Before I reinvent the wheel (or even do the conversion manually),
is there already a piece of free software out there that parses
an SGML DTD and converts it into something resembling the
state transition table of a deterministic push-down automaton?
It does not even have to deal with attributes, as I'm only
interested here in validating the element structure and
detecting omitted tags.
Thanks,
Markus
--
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain