The Hong Kong University of Science and Technology Department of Computer Science PhD Thesis Defence "Efficient Structural Join Processing Algorithms" By Mr. Kaiyang Liu Abstract Extensible Markup Language (XML) has become the de facto standard for data representation, exchange and publishing over the Internet. As more and more XML data is gathered and processed by database management systems, we need innovative approaches to efficiently manage and query it. However, XML queries possess some unique features, which distinguish them from the traditional RDBMS queries and pose a challenge to the XML research community. This dissertation investigates the problem of how to improve the XML query performance by considering the unique features of XML queries. By fully exploiting these features, we are able to propose some novel algorithms to significantly improve XML query performance. We first investigate the issue of optimizing XML queries without considering the workload. A basic kind of XML query is the containment query (or structural query), which retrieves all descendants of a specific XML element. Structural queries are important because they constitute the building block of various complex XML queries. Among proposed XML query processing techniques for structural queries, structural join outperforms most other graph-traversal based approaches as it minimizes the number of nodes accessed to evaluate a structural query. In the first part of this dissertation we show that general containment queries, involving conditions on attributes, possess inherent multi-dimensional characteristics that are not captured by existing XML query processing methods based on 1D ordering. Motivated by this, we index XML data with spatial access methods and develop efficient algorithms to minimize the query cost. Extensive experimental evaluations confirm that our algorithms provide significant performance gains for numerous query types. Our next goal is to further improve the query performance by considering the query workload. All existing indices for structural join focus on indexing the encoding information of XML elements so as to improve the XML query performance. Consequently, such indices utilize only data characteristics, but ignore query characteristics that are important for the further improvement of the query performance. In this thesis, we propose AC-tree (Adaptive Cluster B+-tree), which is a fully workload-aware structural join index, to provide a simple, but efficient way to exploit the XML query characteristics for performance improvement. To the best of our knowledge, AC-tree is the first structural join index to take the workload into consideration. An extensive set of experiments confirm that AC-tree outperforms competitors significantly for both the simple XML queries and branching queries. Date: Tuesday, 30 November 2004 Time: 10:00a.m.-12:00noon Venue: Room 2404 Lifts 17-18 Chairman: Prof Li Xin Wu (MATH) Committee Members: Prof. Frederick H. Lochovsky (Supervisor) Prof. Dik Lun Lee Prof. Dimitrios Papadias Prof. David Bodoff (ISMT) Prof. Elisa Bertino (Computer Sciences & CERIAS, Purdue University) **** ALL are Welcome ****