Analyzing Data with k-regret Query and Diversification

PhD Qualifying Examination


Title: "Analyzing Data with k-regret Query and Diversification"

by

Mr. Min XIE


Abstract:

Extracting interesting tuples from a large database is an important problem in 
multi-criteria decision making. Two representative queries were proposed in the 
literature: top-k queries and skyline queries. A top-k query requires users to 
specify their utility functions beforehand and then returns exactly k tuples to 
users. A skyline query does not require any utility function from users but we 
cannot predict its output size in advance.

Recently, a k-regret query, which tries to minimize a criterion called the 
maximum regret ratio, was proposed to overcome the deficiencies of both top-k 
queries and skyline queries. In other words, a k-regret query does not require 
any utility function from users, and the output size is controllable. Various 
algorithms for performing k-regret queries have been proposed in the literature 
and they can be divided into two categories: the CUBE approach and the Greedy 
approach. The CUBE approach was proven to have a theoretical guarantee on the 
maximum regret ratio of the set of resulting tuples but its empirical quality 
is not satisfiable enough. The Greedy approach was shown to return a set with a 
better maximum regret ratio in practice but it does not have any theoretical 
guarantee. Depending on the way that is used to determine the tuple to be 
selected in the Greedy approach, the Greedy approach can be further divided 
into two approaches, namely the LP-Greedy approach and the Geo- Greedy 
approach. The LP-Greedy approach formulates the k-regret query as a number of 
linear programs and performs the query by solving the linear programs. The 
Geo-Greedy approach avoids the usage of the time-consuming linear programs and 
solves the k-regret query by making use of a number of geometry properties.

Meanwhile, result diversification, which takes both relevance and diversity 
into consideration, has attracted a lot of attentions. It aims at improving the 
quality of the result retrieved by the user query. Different definitions for 
diversification were proposed in the literature: DisC diversity, diversified 
top- k set and top k-similar diversification set. DisC diversity aims at 
finding a minimum set that satisfies the following two conditions: (1) any two 
tuples in the set are not similar to each other; (2) for each tuple in the 
database, there is a similar tuple in the selected set. In diversified top-k 
set, each tuple is associated with a score. Diversified top-k set is the set of 
at most k tuples, such that no two tuples are similar with each other, and the 
total score of the set is maximized. Top k-similar diversification set is the 
set with the maximum objective value which is defined in term of both relevance 
and diversity.

In this survey, we first review the definitions of the k-regret query and 
present the algorithms for solv- ing the k-regret query. We then study 
different instances of the diversification problem. Besides, we discuss the 
possible relationship between the k-regret query and the diversification 
problem. Finally, we conclude this survey by giving some future research 
directions.


Date:			Thursday, 9 March 2017

Time:                  	4:00pm - 6:00pm

Venue:                  Room 3494
                         Lifts 25/26

Committee Members:	Dr. Raymond Wong (Supervisor)
 			Prof. Shing-Chi Cheung (Chairperson)
 			Prof. Lei Chen
 			Prof. Frederick Lochovsky


**** ALL are Welcome ****