The Hong Kong University of Science and Technology
Department of Computer Science


PhD Thesis Defence


"Efficient Structural Join Processing Algorithms"

By

Mr. Kaiyang Liu


Abstract

Extensible Markup Language (XML) has become the de facto standard for data
representation, exchange and publishing over the Internet. As more and
more XML data is gathered and processed by database management systems, we
need innovative approaches to efficiently manage and query it. However,
XML queries possess some unique features, which distinguish them from the
traditional RDBMS queries and pose a challenge to the XML research
community. This dissertation investigates the problem of how to improve
the XML query performance by considering the unique features of XML
queries. By fully exploiting these features, we are able to propose some
novel algorithms to significantly improve XML query performance.

We first investigate the issue of optimizing XML queries without
considering the workload. A basic kind of XML query is the containment
query (or structural query), which retrieves all descendants of a specific
XML element. Structural queries are important because they constitute the
building block of various complex XML queries. Among proposed XML query
processing techniques for structural queries, structural join outperforms
most other graph-traversal based approaches as it minimizes the number of
nodes accessed to evaluate a structural query. In the first part of this
dissertation we show that general containment queries, involving
conditions on attributes, possess inherent multi-dimensional
characteristics that are not captured by existing XML query processing
methods based on 1D ordering. Motivated by this, we index XML data with
spatial access methods and develop efficient algorithms to minimize the
query cost. Extensive experimental evaluations confirm that our algorithms
provide significant performance gains for numerous query types.

Our next goal is to further improve the query performance by considering
the query workload. All existing indices for structural join focus on
indexing the encoding information of XML elements so as to improve the XML
query performance. Consequently, such indices utilize only data
characteristics, but ignore query characteristics that are important for
the further improvement of the query performance. In this thesis, we
propose AC-tree (Adaptive Cluster B+-tree), which is a fully
workload-aware structural join index, to provide a simple, but efficient
way to exploit the XML query characteristics for performance improvement.
To the best of our knowledge, AC-tree is the first structural join index
to take the workload into consideration. An extensive set of experiments
confirm that AC-tree outperforms competitors significantly for both the
simple XML queries and branching queries.


Date:			Tuesday, 30 November 2004

Time:			10:00a.m.-12:00noon

Venue:			Room 2404
			Lifts 17-18

Chairman:		Prof Li Xin Wu (MATH)

Committee Members:	Prof. Frederick H. Lochovsky (Supervisor)
			Prof. Dik Lun Lee
			Prof. Dimitrios Papadias
			Prof. David Bodoff (ISMT)
			Prof. Elisa Bertino (Computer Sciences & CERIAS,
					     Purdue University)



**** ALL are Welcome ****