Processing and Management of Uncertain Information in Vague Databases

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Processing and Management of Uncertain Information in Vague Databases"

By

Mr. An LU


Abstract

Uncertain information is common in many database applications due to intensive 
data dissemination arising from different pervasive computing sources, such as 
the high volume data obtained from sensor networks and mobile communications. 
In this thesis, we propose how to process and manage uncertain information in 
vague databases. Our work mainly focuses on four aspects: modelling uncertain 
information by vague sets, maintaining consistency in vague databases, 
extending SQL to query vague relations and mining vague association rules.

Modelling uncertain information by vague sets is the gravity of our work. We 
discuss how to measure vagueness in practice and the relationships between 
vague memberships and nulls. A new similarity measure of vague sets and the 
concepts of median membership (m) and imprecision membership (i) are proposed. 
Based on these two memberships, we define the notions of mi-overlap, mi-union 
and mi-intersection between vague sets and the concepts of vague relations and 
vague databases.

Functional dependencies (FDs) and inclusion dependencies (INDs) are the most 
fundamental integrity constraints that arise in practice in relational 
databases. We utilize FDs and INDs to maintain the consistency of a vague 
database. First, we tackle the problem, given a vague relation r and a set of 
FDs F, of how to obtain the “best” approximation of r with respect to F when 
taking into account the median membership and the imprecision membership 
thresholds. Using these two thresholds of a vague set, we define a merge 
operation on r. Second, we consider, given a vague database d and a set of INDs 
N, how to obtain the minimal possible change in value-precision for d. Finally, 
we develop a vague chase procedure as a means to maintain consistency of d with 
respect to F and N.

Incorporating the notion of vague sets in relations, we propose vague SQL 
(VSQL), which is an extension of SQL for the vague relational model, and show 
that VSQL com- bines the capabilities of a standard SQL with the power of 
manipulating vague relations. Although VSQL is a minimal extension that 
illustrates its usages, VSQL allows users to formulate a wide range of queries 
on vague data.

Using vague sets, we addresses the limitations of traditional association rule 
(AR) mining, which only discovers the hidden relationship among the items that 
have been sold but ignores the items that are almost sold. For example, in many 
online shopping applications, such as Amazon and eBay, those items that have 
been browsed in detail or put into the basket but are not checked out (almost 
sold items) carry hesitation information, since customers are hesitating to buy 
them. We propose a new notion of vague association rules (VARs) and devise an 
efficient algorithm to mine the VARs.


Date:			Thursday, 16 April 2009

Time:			10:30a.m.-12:30p.m.

Venue:			Room 4475
 			Lifts 25-26

Chairman:		Prof. Fugee Tsung (IELM)

Committee Members:	Prof. Wilfred Ng (Supervisor)
 			Prof. Dik-Lun Lee
 			Prof. Ke Yi
 			Prof. Susheng Wang (ECON)
 			Prof. Qing Li (Comp. Sci., City Univ.)


**** ALL are Welcome ****