Research at HKUST Database Group

Our research focuses fall into three main areas:

Specific research projects will often span the three research focuses as well as delve into other applications of database systems. Below we elaborate further on some of the research areas and issues encompassed by our research.

Information retrieval addresses problems associated with content extraction, indexing, querying and ranking of unstructured texts. Current research is focused on information retrieval of web pages, which includes peer-to-peer search systems and the application of machine learning methods to improve retrieval quality and cater for personalized search.

Hidden web resource discovery and integration addresses issues related to finding, querying, extracting and integrating data from back-end web database systems. Research issues include developing techniques for automatically locating such web sites, for inducing wrappers to extract the data from web pages produced by such sites, for automatically labeling the extracted data and for integrating the data extracted from various sites.

Web query processing investigates new challenges posed by data-intensive Web applications, such as database-backed web sites. The research focuses on adaptive and distributed query processing (including caching) to meet the scale and heterogeneity of such applications in a volatile-networked environment.

Web mining encompasses research in web content, web structure and web usage mining. Web content mining investigates techniques for extracting knowledge from the content of documents or their descriptions. Web structure mining looks at ways to infer knowledge from the WWW organization and links between references and referents on the web. Finally, web usage mining investigates methods for extracting interesting patterns in web server logs.

XML data management poses new challenges related to support for efficient processing of structural queries. The basic database query processing issues need to be re-examined in this new context, including efficient processing and index support for primitive structural operations, estimation of the result size of such operations, and optimization techniques for queries involving multiple operations, such as twig joins.

Mobile databases examine traditional database problems, including indexing, caching and query processing, in wireless environments where bandwidth and client resources are limited. In addition, novel problems caused by data and client mobility need to be addressed.

Spatial databases deal with the management of multi-dimensional data. Queries of interest involve range search, nearest neighbors and spatial joins. Spatio-temporal databases deal with the efficient manipulation of objects moving in space. Topics of interest include access methods for historical information retrieval and future prediction, spatio-temporal data warehouses and query processing techniques.

Enterprise systems address the development, integration, deployment and quality assurance of e-business applications. Research issues include identification of architecture, management of design aspects, services integration and software componentization.