FOUNDATIONS OF MARKUP: SGML and XML

Currently under construction

The Standard Generalized Markup Language (SGML) is a meta-syntactic language for the definition of Document Type Definitions (DTDs), which are, essentially, extended context-free grammars in which the right-hand sides of productions are similar to regular expressions and are called content models. The Extensible Markup Language (XML) is, essentially, a simplified version of SGML used to specify DTDs of Web-based documents.

Anne Brueggemann-Klein and I began investigating SGML in the early nineties; the investigation led to a number of publications about ambiguity, in the SGML sense, for content models. The results carry over directly to XML. Currently we are investigating XML with the aim of producing an XML tokenizer and parser generator using standard compiler-writing techniques.

Darrell Raymond, Frank Tompa and I attempted to address the issue of what is markup and what are appropriate meta-semantics for SGML.

Pekka Kilpelainen, Helen Cameron, Chris Cleverley and I examined the issues of exceptions and their expressive power, the decidability of structural equivalence of DTDs and how tag minimization can be defined in a general way.

  • [Refereed Journal Articles:]
    1. A. Brueggemann-Klein. Regular Expressions into Finite Automata. Theoretical Computer Science, 120:197-213, 1993. A preliminary version of this paper.
    2. A. Brueggemann-Klein. Compiler-Construction Tools and Techniques for SGML Parsers: Difficulties and Solutions, Electronic Publishing--Origination, Dissemination and Design, (1996), to appear. A preliminary version of this paper.
    3. D.R. Raymond, F.W. Tompa, and D. Wood, From Data Representation to Data Model: Meta-Semantic Issues in the Evolution of SGML, Computer Standards & Interfaces, 18:25-36, 1996. A preliminary version appeared as Technical Report HKUST-CS95-32.
    4. A. Brueggemann-Klein and D. Wood, The Validation of SGML Content Models, Mathematical and Computer Modelling, 25:73-84, 1997. A preliminary version of this paper.
    5. A. Brueggemann-Klein and D. Wood, One-Unambiguous Regular Languages, Information and Computation, 140:229-253, 1998. A preliminary version of this paper.
    6. A. Brueggemann-Klein and D. Wood, Caterpillars: A Context Specification Technique, Markup Languages: Theory & Practice, 2(1):81-106, 2000. A preliminary version of this paper appeared as TCSC Research Report 2000-08.
    7. P. Kilpelainen, SGML & XML Content Models, Markup Languages: Theory & Practice, 1(2):53-76, 1999. A preliminary version of this paper appeared as University of Helsinki, Technical Report C-1998-12.
    8. P. Kilpelainen and D. Wood, SGML and XML Document Grammars and Exceptions, Information and Computation 169, (2001), 230-251. A preliminary version of this paper appeared as TCSC Research Report 1999-01.
    9. A. Brueggemann-Klein and D. Wood, The Regularity of Two-Way Nondeterministic Tree Automata Languages, International Journal of Foundations of Computer Science 13, (2002), 67-81. A preliminary version of this paper appeared as TCSC Research Report 2000-10.
    10. H.A. Cameron and D. Wood, Structural Equivalence of Extended Context-Free and E0L Grammars, submitted for publication, (2000).
    11. Anne Brueggemann-Klein, Stefan Hermann, and Derick Wood, The Visual Specification of Context, Markup Languages: Theory & Practice 3(2), (2001), 213-238. A preliminary version of this report appeared as TCSC Research Report 2001-06.

  • [Refereed Conference Presentations:]
    1. A. Brueggemann-Klein and D. Wood, On Deterministic Regular Languages, Proceedings of STACS 92, Springer-Verlag Lecture Notes in Computer Science 577, (1992), 173-184. A preliminary version of this paper.
    2. A. Brueggemann-Klein, Regular Expressions into Finite Automata, in I.~Simon, editor, Latin 92, Springer-Verlag Lecture Notes in Computer Science 583, (1992), 87-98. A preliminary version of this paper.
    3. D. R. Raymond, and F.W. Tompa and D. Wood, Markup Reconsidered, Workshop on Principles of Document Processing, (1992). A full, preliminary version of this paper.
    4. A. Brueggemann-Klein and D. Wood, The Validation of SGML Content Models, Workshop on Principles of Document Processing (PODP '92), (1992). A preliminary version of this paper.
    5. A. Brueggemann-Klein, Unambiguity of Extended Regular Expressions in SGML Document Grammars, in Th. Lengauer, editor, Algorithms--ESA 93, Springer-Verlag Lecture Notes in Computer Science 726, (1993), 73-84. A preliminary version of this paper.
    6. P. Kilpelainen and D. Wood, SGML and Exceptions, Principles of Document Processing, PODP '96, C. Nicholas and D. Wood (Eds.), Springer-Verlag Lecture Notes in Computer Science 1293, (1997), 39-49. A preliminary version appeared as Technical Report HKUST-CS96-30.
    7. A. Brueggemann-Klein, S. Hermann and D. Wood, Context and Caterpillars and Structured Documents, Principles of Digital Document Processing, PODDP '98, E. Munson, C. Nicholas and D. Wood (Eds.), Springer-Verlag Lecture Notes in Computer Science 1481, (1998), 1-9. A preliminary version of this paper appeared as TCSC Research Report 1998-04.
    8. A. Brueggemann-Klein, S. Hermann and D. Wood, Visually Specifying Context, Proceedings of the Third ICCC/IFIP Conference on Electronic Publishing, J.W.T. Smith, A. Ardo and P. Linde (Eds.), International Council for Computer Communications, Washington, DC, (1999), 103-118. The online conference proceedings and a preliminary version.
    9. A. Brueggemann-Klein, S. Hermann and D. Wood, The Visual Specification of Context, Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, Baltimore, Maryland, 28-36, 1999. A preliminary version of this paper appeared as TCSC Research Report 1998-12.
    10. A. Brueggemann-Klein and D. Wood, Caterpillars, Context, Tree Automata and Tree Pattern Matching, Proceedings of the Fourth International Conference on Developments in Formal Language Theory (DLT '99): Foundations, Applications and Perspectives, G. Rozenberg and W. Thomas (Eds), World Scientific Publishing Co. Pte. Ltd., Singapore, (2000), 270-285. A preliminary version of this paper appeared as TCSC Research Report 2000-02.
    11. A. Brueggemann-Klein and D. Wood, Regularly Extended Two-Way Nondeterministic Tree Automata, Automata Implementation: CIAA 2000, Springer-Verlag Lecture Notes in Computer Science 2088, (2001), 57-66. A preliminary version of this paper appeared as TCSC Research Report 2000-07.
    12. A. Brueggemann-Klein and D. Wood, Document Engineering with Extensible Abstract Document Structures, Principles of Digital Document Processing, PODDP '00, E. Munson and D. Wood (Eds.), Springer-Verlag Lecture Notes in Computer Science, (2000), to appear. The version that appears in the PODDP '00 proceedings has a different title; namely, A Conceptual Model for XML. A preliminary version of the paper (with the original title) appeared as TCSC Research Report 2000-09.

  • [Books and Chapters in Books:]
    1. D. Wood, Standard Generalized Markup Language: Mathematical and Philosophical Issues, in Computer Science Today, edited by Jan van Leeuwen (New York, NY: Springer-Verlag Lecture Notes in Computer Science 1000, 1995), 344-365. A preliminary version of this paper.

  • [Theses:]
    1. A. Brueggemann-Klein, Formal Models in Document Processing, Habilitationsschrift, Fakultaet Mathematik, Universitaet Freiburg, 1993. A preliminary version of this paper.
    2. Stefan Hermann, Design Specifications in The Digital Publication Process. PhD thesis, Fakultaet fuer Informatik, Technische Universitaet Muenchen, July 2000.
  • [Miscellaneous publications:]
    1. D. R. Raymond, F. W. T. Tompa and D. Wood, Markup Reconsidered, Department of Computer Science, University of Waterloo, Research Report CS-92-??, 1992, and Department of Computer Science, University of Western Ontario, Technical Report 356, 1992. A full, preliminary version of this paper.
    2. Jacques Andre, Anne Brueggemann-Klein, Richard Furuta and Vincent Quint, History of Document Processing, March, 1994. A version of this paper.
    3. A. Brueggemann-Klein and D. Wood, A Formal Definition of the XML Language, 2000. A preliminary version of this working paper.

      The original XML rules.

      The rewritten XML rules.

      The extracted syntax level XML rules.

    4. A. Brueggemann-Klein, M. Murata and D. Wood, Regular tree and regular hedge languages over unranked alphabets: Version 1, April 3, 2001. A preliminary version of this report appeared as TCSC Research Report 2001-05. The writing of this report is ongoing.

  • Last updated by Derick Wood, 27/01/2004