FOUNDATIONS OF MARKUP: SGML and XML
Currently under construction
The Standard Generalized Markup Language (SGML)
is a meta-syntactic language for the definition of Document
Type Definitions (DTDs), which are, essentially, extended context-free
grammars in which the right-hand sides of productions are similar to
regular expressions and are called content models.
The Extensible Markup Language (XML) is, essentially, a
simplified version of SGML used to specify DTDs
of Web-based documents.
Anne Brueggemann-Klein and I began investigating SGML
in the early nineties; the investigation led to a number of
publications about ambiguity, in the SGML sense, for content models.
The results carry over directly to XML.
Currently we are investigating XML with the aim of producing an
XML tokenizer and parser generator using standard compiler-writing
techniques.
Darrell Raymond, Frank Tompa and I attempted to address
the issue of what is markup and what are appropriate meta-semantics
for SGML.
Pekka Kilpelainen, Helen Cameron, Chris Cleverley and I
examined the issues of exceptions and their expressive power, the
decidability of structural equivalence of DTDs and how tag minimization
can be defined in a general way.
Last updated by Derick Wood, 27/01/2004