Keyword Search over Relational Data

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Keyword Search over Relational Data"

By

Mr. Alexander Markowetz


Abstract

Due to its ease of use, relational keyword search (R-KWS) has become 
increasingly popular. Compared to SQL, it has several benefits. First, 
R-KWS hides the schema from users and allows searching for combinations of 
terms without knowing in which data sources they appear. Second, its 
queries are easy to express. Third, many search tasks only become feasible 
through R-KWS. However, R-KWS is a young concept that holds many 
challenges. First, it lacks common semantics, and different 
implementations encounter various pitfalls. Second, its simplicity comes 
at a high computational cost. Specifically, R-KWS explores a vast search 
space, composed of all possible combinations of keyword occurrences in any 
attribute of every table. There are two general methodologies for query 
processing: (i) graph based (GB), traversing a materialized data graph, 
and (ii) operator based (OB), executing a set of relational operator 
trees. In both cases, a large portion of the computations are wasted on 
traversals/operators that fail to return results. Finally, while data 
streams become increasingly important to our information landscapes, R-KWS 
has been restrained to static data. The contribution of this paper is thus 
threefold. First, it provides homogenized semantics and matching query 
processing methods (GB as well as OB) that are formally shown to be 
correct. Second, it introduces a comprehensive framework for reachability 
indexing that quickly collapses the search space, and leads to significant 
savings. Third, it extends R-KWS to continuous queries over relational 
data streams. In particular, it proposes temporal semantics, as well as 
matching query processing techniques (GB as well as OB) that are further 
optimized. Extensive experiments demonstrate the high practicability of 
R-KWS, on tables as well as streams, and the significant benefits of the 
proposed optimizations.


Date:			Friday, 26 September 2008

Time:			2:30p.m.-4:30p.m.

Venue:			Room 4480
 			Lifts 25-26

Chairman:		Prof. Chi-Ying Tsui (ECE)

Committee Members:	Prof. Dimitris Papadias (Supervisor)
 			Prof. Lei Chen
 			Prof. Mounir Hamdi
 			Prof. Ajay Joneja (IELM)
 			Prof. Kian-Lee Tan (Comp. Sci.,
 					    National Univ. of Singapore)


**** ALL are Welcome ****