Discussion:
geometric analysis of web-documents
(too old to reply)
Klein Hesselink, Hans
2003-11-28 10:22:25 UTC
Permalink
Hai all,

For my final thesis I'm working on a project were I would like to
enrich the annotation of 'structured' webdocuments. Now I'am searching
for information on how to do a geometrical analysis of webdocuments.
All though the layout is desribed by the HTML i can't put my finger on
how to convert this information to a geometric annoted web-document.

Hope this message will return some new ideas, or other kind of
sollutions to this problem. Thx all lot for the effort.

Greetings,

Hans Klein Hesselink
Tad McClellan
2003-11-28 15:13:51 UTC
Permalink
Subject: geometric analysis of web-documents
What do you mean when you say "geometric analysis"?

I am not familiar with the term.
For my final thesis I'm working on a project were I would like to
enrich the annotation of 'structured' webdocuments.
When you say "structured" where people talk about markup, it refers
to the structure of the _content_ (data).

(your feeling that it should be quoted has me wondering exactly
what it is that is being discussed.
)

Are you using the term that way, or do you mean some other
kind of "structure"?



There are many types of structured documents on the web,
HTML, XML, etc.

Are you planning to study all of them, or only HTML?

If it HTML from arbitrary web sites, then it may not really
be valid (SG|HT)ML since browsers can display invalid markup.

You may need to "normalize" the HTML using "HTML Tidy"
(http://tidy.sourceforge.net/).
All though the layout is desribed by the HTML
Oh! Does "geometrical analysis" mean how it is rendered?

The layout is _not_ described by the markup (HTML), the structure
is described. How that structure is rendered is up to the
application (eg: browser) that is interpreting the markup.
i can't put my finger on
how to convert this information to a geometric annoted web-document.
Each _browser_ decides how to render the markup.

Different "brands" of browser will/may use a different rendering.

Even the same browser may use different renderings between
different versions of the browser.

The same browser and same version may use a different rendering
when the size of the window is changed.

Some browsers allow user-selected options that affect rendering.

Some browsers don't have much of a "geometric" aspect to them
at all, eg: HTML-to-voice for sightless web surfers.



You'd need about a bazillion fingers to cover all of
the possibilities.

Better start by limiting "webdocuments" to something more manageable. :-)
Hope this message will return some new ideas,
If you are studying how a web document "looks", then what you
are really asking about is the behavior of a web browser,
rather than about SGML.

If not, then you can disregard just about everything I wrote above.
or other kind of
sollutions to this problem.
Call PDF a "webdocument" so you'll have something that really
carries geometric information about the layout. <grin>
--
Tad McClellan SGML consulting
***@augustmail.com Perl programming
Fort Worth, Texas
Loading...