Discussion:
normalizing sgml to xml - how to handle random tags
(too old to reply)
Luch
2006-06-06 02:05:07 UTC
Permalink
Hello,
I've followed the advice i've read on websites and this newsgroups in
converting my SGML (OEM manuals data from aircraft manufacturer's such as
Boeing) to XML in order to better work with it. I'd like to have a
stylesheet to be able to then show the data in PDF. The approach we have
taken is:

1) Convert the SGML to XML (using SX/sgmlnorm).
2) Write an XML XSD that corresponds to the SGML DTD.
3) Write an XML stylesheet, based on the XSD to show the XML data in a
pretty format that can then be shown in PDF.

We've made good progress and think we're almost there, but i wanted to know
if this was still a good way to go? Is doing the XSD really benefiical or
should that step be skipped and another method used?

And finally, the problems we've had

1) SGML tags that have a start but no ending tag... we fixed this by just
immediately closing the tags.

The problem we haven't solved that i'd like help on:
2) SGML tags that appear "randomly" within the data. We do not understand
how to put these into the XSD. An example is in the attached DTD. Under the
"MISCELLANIOUS CONSTRUCTIONS" section at the bottom. Tags such as: effect ,
revst , revend . They will appear randomly within the SGML/XML data, and we
can't understand how to pattern an XSD after that.

Below is the DTD:

<!-- ==================================================== -->
<!-- DTD HEADER -->
<!-- ==================================================== -->
<!-- -->
<!-- This DTD is for the Boeing Task Card Manual -->
<!-- used for all aircraft models. -->
<!-- -->
<!-- DTD reference: TCBOE16 -->
<!-- Revision Date: 02/27/03 -->
<!-- Highlight: -->
<!-- 1) Changed SEQNUM element to optional within the -->
<!-- TABLE element structure. TABLES are not all -->
<!-- sequenced within the Boeing authoring system. -->
<!-- 2) Changed version to 16 (TCBOE16.DTD). -->
<!-- -->
<!-- Revision History TCBOE15: -->
<!-- 1) Added SEQNUM element to GRAPHIC and TABLE for -->
<!-- sequence order. SEQNUM content is #PCDATA. -->
<!-- 2) Changed version to 15 (TCBOE15.DTD). -->
<!-- -->
<!-- Revision History TCBOE14: -->
<!-- 1) Modified %g.r; entity to include REFBLOCK. -->
<!-- 2) Added HAZMTLST and HAZMT element structures. -->
<!-- Added optional HAZMTLST to TCBODY structure. -->
<!-- HAZMTLST is used in the ATA DTDs. -->
<!-- 3) Added OIDATA and TSN attributes to TC element -->
<!-- to be the same as ATA DTDs. -->
<!-- 4) Added inclusion of REVST, REVEND, COCST, and -->
<!-- COCEND to TASKCARD element. Needed for eMOD -->
<!-- processing of Task Cards. -->
<!-- 5) Added CHAPPREF attribute to TASK, SUBTASK and -->
<!-- GRAPHIC to conform to the ATA AMM DTD. -->
<!-- 6) Changed the KEY attribute to ID in GRAPHIC -->
<!-- and SHEET. Added #IMPLIED ID attribute to Task-->
<!-- and SUBTASK. -->
<!-- 7) Added EQU, SUB and SUPER element choices to -->
<!-- PARA structure. All have #PCDATA content. -->
<!-- 8) Added DOCNBR attribute to REFEXT. -->
<!-- 9) Changed attribute REFID to #IMPLIED in -->
<!-- REFINT. -->
<!-- 10) Added REFBLOCK element definition. Used in -->
<!-- %g.r.; entity. -->
<!-- 11) Added EFFECT as an inclusion to WARNING, -->
<!-- CAUTION and NOTE. -->
<!-- 12) Changed attribute ALIGN to #IMPLIED in -->
<!-- COLSPEC element. -->
<!-- 13) Changed attribute SPANNAME to #IMPLIED in -->
<!-- SPANSPEC element. -->
<!-- 14) Added GRPAHIC choice to TABLE ENTRY element. -->
<!-- 15) Changed SHEETNBR attribute to #IMPLIED in -->
<!-- element SHEET. -->
<!-- 16) Changed the TITLE element structure to match -->
<!-- the ATA AMM including choices of EIN, SBNBR, -->
<!-- SUB and SUPER. -->
<!-- 17) Added element SBNBR for use in TITLE content. -->
<!-- 18) Changed version to 14 (TCBOE14.DTD). -->
<!-- 19) Editorial changes for readability. -->
<!-- -->
<!-- Revision History TCBOE13: -->
<!-- 1) Changed TCBOE11 into a valid production -->
<!-- Task Card DTD. This DTD does not include -->
<!-- the EMOD draft changes. They will be added -->
<!-- when eMOD is ready to produce Task Cards. -->
<!-- 2) Changed dtd version number to 13. -->
<!-- -->
<!-- Revision History TCBOE12: -->
<!-- 1) Draft TC DTD submitted to SRG by eMOD. -->
<!-- NOT IN USE YET! -->
<!-- -->
<!-- Revision Highlight TCBOE11: -->
<!-- 1) Changed PARA+ content to %TEXT; for INTNOTE, -->
<!-- APAPNOTE, EAPPNOTE, ACPANOTE, TASKPROC & -->
<!-- TASKDESC. -->
<!-- 2) Changed PRETOPIC content to align with the -->
<!-- AMM DTD. -->
<!-- 3) Changed content of RTASKNBR to a repeating -->
<!-- choice of #PCDATA or REFINT. -->
<!-- 4) Changed APAPNOTE to optional repeating in -->
<!-- APAPDATA. -->
<!-- 5) Changed EAPPNOTE to optional repeating in -->
<!-- ENAPDATA. -->
<!-- 6) Changed ACPANOTE to optional repeating in -->
<!-- ACDATA. -->
<!-- 7) Changed SUBTASK attribute SEQ to CDATA to be -->
<!-- consistent with the AMM. -->
<!-- 8) Changed dtd version number to 11. -->
<!-- -->
<!-- Revision History TCBOE10: -->
<!-- 1) Added new optional element tccat. -->
<!-- 2) Added new optional elements tc_manhours, -->
<!-- tc_nbrpeopl, and tc_elaptime. -->
<!-- 3) Added ENTITY character sets %ISOgrk2; ISOgrk3 -->
<!-- %ISOgrk4; & %ISOlat1; to keep in alignment -->
<!-- with the ATA DTDs. -->
<!-- 4) Added implied attribute IMGAREA to SHEET. This-->
<!-- is used for identifying the graphic sheet -->
<!-- image area on the paper output. -->
<!-- 5) Added optional LEGALNTC element structure to -->
<!-- to document top-level. This is used for -->
<!-- tagging Legal Notice (i.e. Copyright, etc.). -->
<!-- ==================================================== -->

<!-- pw<!DOCTYPE tc [ -->
<!-- The following set of declarations may be referred to
using a public entity as follows:
<!DOCTYPE tc PUBLIC
"-//BOEING//DTD TASKCARD-BOEING-VER16-LEVEL4//EN" -->

<!-- ==================================================== -->

<!-- ==================================================== -->
<!-- NOTATIONS -->
<!-- ==================================================== -->
<!NOTATION cgm PUBLIC
"-//USA-DOD//NOTATION Computer Graphics Metafile//EN" >
<!NOTATION ccitt4 PUBLIC
"-//USA-DOD//NOTATION CCITT Group 4 Facsimile//EN" >
<!-- ==================================================== -->

<!-- ==================================================== -->
<!-- ENTITIES -->
<!-- ==================================================== -->
<!ENTITY % g.r "(refblock* | (grphcref*, refext*,
refint*))" >
<!ENTITY % w.c "(warning*, caution*)" >

<!ENTITY % text "(para | note | table | unlist
| numlist)+" >

<!ENTITY % yesorno "NUMBER" >

<!ENTITY % deleted "(deleted, chgdesc*)" >

<!ENTITY % revatt
"chg (N|R|U|D) #REQUIRED
key ID #REQUIRED
revdate NUMBER #IMPLIED" >

<!ENTITY % ISOtech PUBLIC
"ISO 8879-1986//ENTITIES General Technical//EN" >
<!ENTITY % ISOpub PUBLIC
"ISO 8879-1986//ENTITIES Publishing//EN" >
<!ENTITY % ISOnum PUBLIC
"ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN">
<!ENTITY % ISOgrk1 PUBLIC
"ISO 8879-1986//ENTITIES Greek Letters//EN" >
<!ENTITY % ISOgrk2 PUBLIC
"ISO 8879-1986//ENTITIES Monotoniko Greek//EN" >
<!ENTITY % ISOgrk3 PUBLIC
"ISO 8879-1986//ENTITIES Greek Symbols//EN" >
<!ENTITY % ISOgrk4 PUBLIC
"ISO 8879-1986//ENTITIES Alternative Greek Symbols//EN">
<!ENTITY % ISOlat1 PUBLIC
"ISO 8879-1986//ENTITIES Added Latin 1//EN" >

%ISOtech; %ISOpub; %ISOnum; %ISOgrk1; %ISOgrk2; %ISOgrk3;
%ISOgrk4; %ISOlat1;

<!-- ==================================================== -->
<!-- TASKCARD HIGH LEVEL STRUCTURE -->
<!-- ==================================================== -->
<!ELEMENT tc - - (title, gchgdesc*, legalntc?, taskcard+)
+(revst | revend | cocst | cocend | hotlink)>

<!ATTLIST tc
model CDATA #REQUIRED
revdate NUMBER #REQUIRED
docnbr CDATA #IMPLIED
cus CDATA #IMPLIED
cusname CDATA #IMPLIED
lang CDATA #REQUIRED
spl CDATA #REQUIRED
oidate CDATA #REQUIRED
tsn CDATA #REQUIRED
chg (N|R|U) #REQUIRED >

<!ELEMENT gchgdesc - - (#PCDATA) >

<!-- =================================================== -->
<!-- Legal Notice Information -->
<!-- =================================================== -->

<!ELEMENT legalntc - - (exprtcl | cpyrght | proptary | geninfo)+>
<!ATTLIST legalntc
%revatt; >

<!ELEMENT proptary - - (fullstmt, partstmt?) >
<!ELEMENT fullstmt - - (title?, cpyrght?, para+) >
<!ELEMENT partstmt - - (title?, para+) >
<!ELEMENT exprtcl - - (para | unlist | numlist | note)+ >
<!ELEMENT cpyrght - - (year*, holder*, geninfo) >
<!ELEMENT year - - (#PCDATA) >
<!ELEMENT holder - - (#PCDATA) >
<!ELEMENT geninfo - - (para | unlist | numlist | note)+ >

<!-- ==================================================== -->
<!-- TASKCARD -->
<!-- ==================================================== -->
<!ELEMENT taskcard - - ((effect, chgdesc*, (mainhead | cchead),
tcbody, graphic*) | %deleted;)
+(revst | revend | cocst | cocend) >
<!ATTLIST taskcard
cardnbr CDATA #REQUIRED
mprog CDATA #REQUIRED
etopsind CDATA #IMPLIED
%revatt; >

<!ELEMENT chgdesc - - (#PCDATA) >
<!ELEMENT mainhead - - (source, skill+, workarea+,
reltask*, intdata, tasktype, title,
apapdata, enapdata, zone+, acdata?)>
<!ELEMENT source - - (#PCDATA) >
<!ELEMENT skill - - (#PCDATA) >
<!ELEMENT workarea - - (#PCDATA) >

<!ELEMENT reltask - - (rtaskcd, rtasknbr) >
<!ELEMENT rtaskcd - - (#PCDATA) >
<!ELEMENT rtasknbr - - (#PCDATA | refint)+ >

<!ELEMENT intdata - - ((interval+, intnote?) | intnote) >
<!ELEMENT interval - - (version, thrshold, repeat?, phase?)>
<!ELEMENT version - - (#PCDATA) >
<!ELEMENT thrshold - - (thrshtyp, ((multiple, unit, sample)
| optask)) >
<!ELEMENT repeat - - (thrshtyp, multiple, unit, sample) >
<!ELEMENT thrshtyp - - (#PCDATA) >
<!ELEMENT multiple - - (#PCDATA) >

<!ELEMENT unit - - (#PCDATA) >
<!ELEMENT sample - - (#PCDATA) >
<!ELEMENT optask - - (#PCDATA) >

<!ELEMENT phase - - (mntcycle, pthrshld, prepeat) >
<!ELEMENT mntcycle - - (#PCDATA) >
<!ELEMENT pthrshld - - (#PCDATA) >
<!ELEMENT prepeat - - (#PCDATA) >
<!ELEMENT intnote - - (%text;) >

<!ELEMENT tasktype - - (#PCDATA) >

<!ELEMENT apapdata - - (apapplic*, apapnote*) >
<!ELEMENT apapplic - - (#PCDATA) >

<!ELEMENT apapnote - - (%text;) >

<!ELEMENT enapdata - - (engappl*, eappnote*) >
<!ELEMENT engappl - - (#PCDATA) >
<!ELEMENT eappnote - - (%text;) >

<!ELEMENT acdata - - (acpan*, acpanote*) >
<!ELEMENT acpan - - (#PCDATA) >
<!ELEMENT acpanote - - (%text;) >

<!ELEMENT zone - - (#PCDATA) >

<!ELEMENT cchead - - (skill+, workarea+, reltask*,
tasktype, title, apapdata, enapdata,
zone+, acdata?) >

<!ELEMENT tcbody - - (tccat?, taskproc?, mpditem*,
pretsmry?, hazmtlst?, task*) >

<!ELEMENT tccat - - (#PCDATA) >

<!ELEMENT mpditem - - (mpdnbr, taskloc?, linspect?, title?,
taskdesc?, ch5tskno?, awlimit?,
supldata) >
<!ELEMENT taskproc - - (%text;) >
<!ELEMENT mpdnbr - - (#PCDATA) >
<!ELEMENT taskloc - - (#PCDATA) >
<!ELEMENT linspect - - (#PCDATA) >
<!ELEMENT taskdesc - - (%text;) >
<!ELEMENT awlimit - - (#PCDATA) >
<!ELEMENT ch5tskno - - (#PCDATA) >
<!ELEMENT pretsmry - - (pretopic+) >

<!-- Supplemental MPD Data: man hours, number of people, -->
<!-- msg-3 task category code, msg-3 failure effect cate- -->
<!-- gory code, elapsed time (comes from MPS database) -->

<!ELEMENT supldata - - (manhours, nbrpeopl, elaptime,
tc_manhours?, tc_nbrpeopl?,
tc_elaptime?, taskctgy?, feftctgy*)>
<!ELEMENT manhours - - (#PCDATA) >
<!ELEMENT nbrpeopl - - (#PCDATA) >
<!ELEMENT elaptime - - (#PCDATA) >
<!ELEMENT tc_manhours - - (#PCDATA) >
<!ELEMENT tc_nbrpeopl - - (#PCDATA) >
<!ELEMENT tc_elaptime - - (#PCDATA) >
<!ELEMENT taskctgy - - (#PCDATA) >
<!ELEMENT feftctgy - - (#PCDATA) >

<!-- ==================================================== -->
<!-- HAZARDOUS MATERIAL STRUCTURE -->
<!-- ==================================================== -->
<!ELEMENT hazmtlst - - (title, pretopic*, hazmt+) >
<!ATTLIST hazmtlst
id ID #REQUIRED >

<!ELEMENT hazmt - - (title, warning) >
<!ATTLIST hazmt
id ID #REQUIRED >

<!-- ==================================================== -->
<!-- TASK -->
<!-- ==================================================== -->
<!ELEMENT task - - (effect, title, %g.r;, %w.c;, note*,
tfmatr?, topic*, graphic*) >
<!ATTLIST task
chappref CDATA #IMPLIED
chapnbr NUMBER #REQUIRED
sectnbr NUMBER #REQUIRED
subjnbr NUMBER #REQUIRED
func CDATA #REQUIRED
seq CDATA #REQUIRED
confltr CDATA #IMPLIED
varnbr NUMBER #IMPLIED
alunqi CDATA #IMPLIED
pgblknbr NUMBER #IMPLIED
confnbr NUMBER #IMPLIED
id ID #IMPLIED >

<!ELEMENT tfmatr - - (pretopic+) >
<!ELEMENT pretopic - - (title, (%text; | list1)) +(effect) >

<!ELEMENT topic - - (effect?, title, %g.r;, %w.c;, note*,
subtask*) >

<!-- ==================================================== -->
<!-- SUBTASK -->
<!-- ==================================================== -->
<!ELEMENT subtask - - (effect, %g.r;, %w.c;, note*,
list1) +(effect) >
<!ATTLIST subtask
chappref CDATA #IMPLIED
chapnbr NUMBER #REQUIRED
sectnbr NUMBER #REQUIRED
subjnbr NUMBER #REQUIRED
func CDATA #REQUIRED
seq CDATA #REQUIRED
confltr CDATA #IMPLIED
varnbr NUMBER #IMPLIED
alunqi CDATA #IMPLIED
pgblknbr NUMBER #IMPLIED
confnbr NUMBER #IMPLIED
id ID #IMPLIED >

<!-- ==================================================== -->
<!-- PARA, REFERENCE ELEMENTS -->
<!-- ==================================================== -->
<!ELEMENT para - - (#PCDATA | cb | con | ein | equ
| grphcref | ncon | pan | refext
| refint | std | sub | super
| ted | txtgrphc | zone)+ >

<!ELEMENT refext - - (#PCDATA) >
<!ATTLIST refext
refman CDATA #IMPLIED
refloc CDATA #IMPLIED
refspl CDATA #IMPLIED
docnbr CDATA #IMPLIED
refmodel CDATA #IMPLIED >

<!ELEMENT refint - - (#PCDATA) >
<!ATTLIST refint
reftype CDATA #IMPLIED
refid IDREF #IMPLIED >

<!ELEMENT refblock - - (#PCDATA | refint | refext | grphcref)*>

<!ELEMENT grphcref - - (effect?, #PCDATA) >
<!ATTLIST grphcref
refid IDREF #IMPLIED
sheetnbr CDATA #IMPLIED
shownow %yesorno; "0"
structid CDATA #IMPLIED >

<!ELEMENT cb - - (#PCDATA) >
<!ELEMENT con - - (connbr, conname) >
<!ELEMENT connbr - - (#PCDATA) >
<!ELEMENT conname - - (#PCDATA) >
<!ELEMENT ein - - (#PCDATA) >
<!ELEMENT equ - - (#PCDATA) >
<!ELEMENT ncon - - (#PCDATA) >
<!ELEMENT pan - - (#PCDATA) >
<!ELEMENT std - - (stdnbr, stdname) >
<!ELEMENT stdnbr - - (#PCDATA) >
<!ELEMENT stdname - - (#PCDATA) >
<!ELEMENT sub - - (#PCDATA) >
<!ELEMENT super - - (#PCDATA) >
<!ELEMENT ted - - (toolnbr, toolname) >
<!ELEMENT toolnbr - - (#PCDATA) >
<!ELEMENT toolname - - (#PCDATA) >

<!-- ==================================================== -->
<!-- WARNING, CAUTION, NOTE -->
<!-- ==================================================== -->
<!ELEMENT warning - - (%text;) -(note) +(effect) >
<!ELEMENT caution - - (%text;) -(note) +(effect) >
<!ELEMENT note - - (%text;) -(note) +(effect) >

<!-- ==================================================== -->
<!-- NUMBERED AND UNNUMBERED LISTS -->
<!-- ==================================================== -->
<!ELEMENT unlist - - (unlitem+) >
<!ATTLIST unlist
bulltype CDATA #IMPLIED >
<!ELEMENT unlitem - - (para+) >

<!ELEMENT numlist - - (numlitem+) >
<!ATTLIST numlist
numtype CDATA #IMPLIED >
<!ELEMENT numlitem - - (para+) >

<!-- ==================================================== -->
<!-- TABLE (CALS) -->
<!-- ==================================================== -->
<!ELEMENT table - - ((seqnum?, title?, tgroup, ftnote*)
| graphic+) -(table) >
<!ATTLIST table
frame (top|bottom|topbot
|all|sides|none) #IMPLIED
colsep %yesorno; #IMPLIED
rowsep %yesorno; #IMPLIED
orient (port|land) #IMPLIED
pgwide %yesorno; #IMPLIED
id ID #IMPLIED >

<!ELEMENT tgroup - o (colspec*, spanspec*, thead?,
tfoot?, tbody) >
<!ATTLIST tgroup
cols NUMBER #REQUIRED
colsep %yesorno; #IMPLIED
rowsep %yesorno; #IMPLIED
align (left|right|center
|justify|char) "left"
charoff NUTOKEN "50"
char CDATA " " >

<!ELEMENT colspec - o EMPTY >
<!ATTLIST colspec
colnum NUMBER #IMPLIED
colname NMTOKEN #IMPLIED
align (left|right|center
|justify|char) #IMPLIED
charoff NUTOKEN #IMPLIED
char CDATA #IMPLIED
colwidth CDATA #IMPLIED
rowsep %yesorno; #IMPLIED
colsep %yesorno; #IMPLIED >

<!ELEMENT spanspec - o EMPTY >
<!ATTLIST spanspec
namest NMTOKEN #REQUIRED
nameend NMTOKEN #REQUIRED
spanname NMTOKEN #IMPLIED
align (left|right|center
|justify|char) "center"
charoff NUTOKEN #IMPLIED
char CDATA #IMPLIED
colsep %yesorno; #IMPLIED
rowsep %yesorno; #IMPLIED >

<!ELEMENT thead - o (colspec*, row+) >
<!ATTLIST thead
valign (top|middle|bottom) "bottom" >

<!ELEMENT tfoot - o (colspec*, row+) >
<!ATTLIST tfoot
valign (top|middle|bottom) "top" >
<!ELEMENT tbody - o (row+) >
<!ATTLIST tbody
valign (top|middle|bottom) "top" >

<!ELEMENT row - o (entry+) >
<!ATTLIST row
rowsep %yesorno; #IMPLIED >

<!ELEMENT entry - o (%text; | %w.c; | graphic)* >
<!ATTLIST entry
colname NMTOKEN #IMPLIED
namest NMTOKEN #IMPLIED
nameend NMTOKEN #IMPLIED
spanname NMTOKEN #IMPLIED
morerows NUMBER "0"
colsep %yesorno; #IMPLIED
rowsep %yesorno; #IMPLIED
rotate %yesorno; "0"
valign (top|middle|bottom) "top"
align (left|right|center
|justify|char) #IMPLIED
charoff NUTOKEN #IMPLIED
char CDATA #IMPLIED >

<!ELEMENT seqnum - - (#PCDATA) >


<!-- ==================================================== -->
<!-- NUMBERED, NESTED LISTS -->
<!-- ==================================================== -->
<!ELEMENT list1 - - (l1item+) >
<!ELEMENT list2 - - (l2item+) >
<!ELEMENT list3 - - (l3item+) >
<!ELEMENT list4 - - (l4item+) >
<!ELEMENT list5 - - (l5item+) >
<!ELEMENT list6 - - (l6item+) >
<!ELEMENT list7 - - (l7item+) >

<!ELEMENT l1item - - (%w.c;, %text;, (list2, note*)?)
-(list1) >
<!ELEMENT l2item - - (%w.c;, %text;, (list3, note*)?)
-(list1) >
<!ELEMENT l3item - - (%w.c;, %text;, (list4, note*)?)
-(list1) >
<!ELEMENT l4item - - (%w.c;, %text;, (list5, note*)?)
-(list1) >
<!ELEMENT l5item - - (%w.c;, %text;, (list6, note*)?)
-(list1) >
<!ELEMENT l6item - - (%w.c;, %text;, (list7, note*)?)
-(list1) >
<!ELEMENT l7item - - (%w.c;, %text;) -(list1) >

<!-- ==================================================== -->
<!-- GRAPHIC -->
<!-- ==================================================== -->
<!ELEMENT graphic - - (effect, seqnum, title?, sheet+) >
<!ATTLIST graphic
chappref CDATA #IMPLIED
chapnbr NUMBER #IMPLIED
sectnbr NUMBER #IMPLIED
subjnbr NUMBER #IMPLIED
func CDATA #IMPLIED
seq CDATA #IMPLIED
confltr CDATA #IMPLIED
varnbr NUMBER #IMPLIED
alunqi CDATA #IMPLIED
pgblknbr NUMBER #IMPLIED
confnbr NUMBER #IMPLIED
id ID #REQUIRED >

<!ELEMENT sheet - - (effect, title?, gdesc?) >
<!ATTLIST sheet
gnbr ENTITY #REQUIRED
sheetnbr CDATA #IMPLIED
imgarea (AP|BP|CP|DL|EL|FP|GP|HL|IL) #IMPLIED
id ID #REQUIRED >
<!ELEMENT gdesc - - (unlist | numlist) >

<!-- ==================================================== -->
<!-- MISCELLANIOUS CONSTRUCTIONS: -->
<!-- MARK EFFECTIVITY -->
<!-- TITLE, REVISION MARKERS -->
<!-- HOTLINK, COC ELEMENTS -->
<!-- DELETED ANCHORS, TEXT GRAPHICS -->
<!-- EMPTY ELEMENTS -->
<!-- ==================================================== -->
<!ELEMENT effect - - ((sbeff | coceff)*) >
<!ATTLIST effect
effrg CDATA #IMPLIED
efftext CDATA #IMPLIED >

<!ELEMENT sbeff - o EMPTY >
<!ATTLIST sbeff
effrg CDATA #IMPLIED
efftext CDATA #IMPLIED
sbnbr CDATA #REQUIRED
sbcond CDATA #REQUIRED
sbrev CDATA #IMPLIED >

<!ELEMENT coceff - o EMPTY >
<!ATTLIST coceff
effrg CDATA #IMPLIED
efftext CDATA #IMPLIED
cocnbr CDATA #REQUIRED >

<!ELEMENT title - - (#PCDATA | ein | sbnbr | sub
| super)+ >

<!ELEMENT sbnbr - - (#PCDATA) >

<!ELEMENT revst - o EMPTY >
<!ELEMENT revend - o EMPTY >
<!ELEMENT cocst - o EMPTY >
<!ELEMENT cocend - o EMPTY >

<!ELEMENT hotlink - o EMPTY >
<!ATTLIST hotlink
targetrefid CDATA #IMPLIED
targetid CDATA #IMPLIED >

<!ELEMENT txtgrphc - - (txtline+) >
<!ELEMENT txtline - - (#PCDATA) >

<!ELEMENT deleted - o EMPTY >

<!-- ==================================================== -->
<!-- FOOTNOTE AND FOOTNOTE REFERENCES -->
<!-- ==================================================== -->
<!ELEMENT ftnote - - (%text;) >
<!ATTLIST ftnote
ftnoteid ID #REQUIRED >
<!-- pw ]> -->
Peter Flynn
2006-06-06 19:55:57 UTC
Permalink
Post by Luch
Hello,
I've followed the advice i've read on websites and this newsgroups in
converting my SGML (OEM manuals data from aircraft manufacturer's such as
Boeing) to XML in order to better work with it. I'd like to have a
stylesheet to be able to then show the data in PDF. The approach we have
1) Convert the SGML to XML (using SX/sgmlnorm).
2) Write an XML XSD that corresponds to the SGML DTD.
3) Write an XML stylesheet, based on the XSD to show the XML data in a
pretty format that can then be shown in PDF.
We've made good progress and think we're almost there, but i wanted to know
if this was still a good way to go? Is doing the XSD really benefiical or
should that step be skipped and another method used?
Unless you need to add constraints or datatypes not supported in XML
DTDs or preserve some of the features from SGML DTDs, you may find it
easier just to turn the SGML DTD into an XML DTD. Details of doing this
are in the XML FAQ at http://xml.silmaril.ie/developers/dtdconv/
Post by Luch
And finally, the problems we've had
1) SGML tags that have a start but no ending tag... we fixed this by just
immediately closing the tags.
Yep.
Post by Luch
2) SGML tags that appear "randomly" within the data. We do not understand
how to put these into the XSD. An example is in the attached DTD. Under the
"MISCELLANIOUS CONSTRUCTIONS" section at the bottom. Tags such as: effect ,
revst , revend . They will appear randomly within the SGML/XML data, and we
can't understand how to pattern an XSD after that.
You can't do this in XML: Inclusion Exceptions (the leading +) and
Exclusion Exceptions (the leading -) were removed in XML. You have to
add the element types to each content model you need them in.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
xmlizer
2006-06-07 16:01:18 UTC
Permalink
Hello
Post by Luch
Hello,
I've followed the advice i've read on websites and this newsgroups in
converting my SGML (OEM manuals data from aircraft manufacturer's such as
Boeing) to XML in order to better work with it. I'd like to have a
stylesheet to be able to then show the data in PDF. The approach we have
1) Convert the SGML to XML (using SX/sgmlnorm).
2) Write an XML XSD that corresponds to the SGML DTD.
3) Write an XML stylesheet, based on the XSD to show the XML data in a
pretty format that can then be shown in PDF.
We've made good progress and think we're almost there, but i wanted to know
if this was still a good way to go?
it is a not a bad way


Is doing the XSD really benefiical or
Post by Luch
should that step be skipped and another method used?
depends on what you want to do of your XML data. XML Schema adds
datatype to XML DTD and some other specifical stuff. The main use of
XMLSchema is because it is widely used in XML tools

The better for SGMList is relaxng (http://relaxng.org/) but is for the
moment not available in all tools
Post by Luch
And finally, the problems we've had
1) SGML tags that have a start but no ending tag... we fixed this by just
immediately closing the tags.
hum...strange, sx used to close them properly...but you have to use
-xempty on command line
Post by Luch
2) SGML tags that appear "randomly" within the data.
well you're not familiar with SGML, it's inclusion/exclusion
Post by Luch
We do not understand how to put these into the XSD.
it is not possible in XML Schema (in fact it is possible ,but it is not
always simple and you may loose constraints from the SGML DTD point of
view)
Post by Luch
An example is in the attached DTD. Under the "MISCELLANIOUS CONSTRUCTIONS" section at the bottom. Tags such as: effect ,
revst , revend . They will appear randomly within the SGML/XML data, and we
can't understand how to pattern an XSD after that.
I can give you some hint if you want if you mail me directly at
INNOVIMAX &at; GMAIL &dot; COM

they are declared here
Post by Luch
<!ELEMENT taskcard - - ((effect, chgdesc*, (mainhead | cchead),
tcbody, graphic*) | %deleted;)
+(revst | revend | cocst | cocend) >
the definition said that "revst, "revend", "cocst" and "cocend"
elements could appear everywhere inside an "taskcard" element ou
sub-element

Xmlizer

Loading...