Query Processing in Uncertain Graphs

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Query Processing in Uncertain Graphs"

By

Mr. Panagiotis PARCHAS


Abstract

Data in several applications can be represented as an uncertain graph, 
whose edges are labeled with a probability of existence. Exact query 
processing on uncertain graphs is prohibitive for most applications, as it 
involves evaluation over an exponential number of instantiations. Thus, 
typical approaches employ Monte-Carlo sampling, which i) draws a number of 
possible graphs (samples), ii) evaluates the query on each of them, and 
iii) aggregates the individual answers to generate the final result. 
However, this approach can also be extremely time consuming for large 
uncertain graphs commonly found in practice. To overcome the high cost, in 
this thesis we propose two approaches. The first extracts deterministic 
representative instances that capture structural properties of the 
uncertain graph. The query and mining tasks can then be efficiently 
processed using deterministic algorithms on these representatives. The 
second approach sparsifies the uncertain graph (i.e., reduces the number 
of its edges) and redistributes its probabilities, minimizing the 
information loss. Then, Monte-Carlo sampling applied to the reduced graph 
becomes much more efficient.

For the first approach, in order to maintain data utility, the 
representative instance should preserve structural characteristics of the 
uncertain graph. We start with representatives that capture the expected 
vertex degrees, as this is a fundamental property of the graph topology. 
We then generalize the notion of vertex degree to the concept of n-clique 
cardinality, i.e., the number of cliques of size n that contain a vertex. 
For the first problem, we propose two methods: Average Degree Rewiring 
(ADR), which is based on random edge rewiring, and Approximate B-matching 
(ABM), which applies graph matching techniques. For the second problem, we 
develop a greedy approach and a game theoretic framework. We 
experimentally demonstrate, with real uncertain graphs, that indeed the 
representative instances can be used to answer, efficiently and 
accurately, queries based on several metrics such as shortest path 
distance, clustering coefficient and betweenness centrality.

For the second approach, inspired by the literature in deterministic graph 
sparsification, we aim at sparsified uncertain graphs that maintain the 
expected cut sizes of the original graph. In addition, in order to reduce 
the uncertainty of the graph and the variance of the performed queries, we 
aim at decreasing the entropy of the input graph. To achieve these goals, 
we propose a framework that consists of two phases: The first (backbone 
generation), creates a deterministic backbone graph. The second, 
(probability assignment), generates probabilities for the edges of the 
backbone graph that capture the expected cuts, while avoiding the 
probability values that incur high entropy. Based on this framework, we 
propose two methods: Gradient Decent Backbone (GDB), which assigns 
probabilities based on gradient decent, and Expectation Maximization 
Degree (EMD), which iteratively updates the backbone graph and assigns 
probabilities until convergence. Moreover, we modify state-of-the-art 
methods of deterministic graph sparsification to comply with the uncertain 
setting, and we experimentally demonstrate that they underperform compared 
to the proposed techniques in a variety of graph related queries, 
including shortest path distance, page rank, clustering coefficient and 
reliability.


Date:			Tuesday, 24 January 2017

Time:			2:00pm - 4:00pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Li Qiu (ECE)

Committee Members:	Prof. Dimitris Papadias (Supervisor)
 			Prof. Pedro Sander
 			Prof. Raymond Wong
 			Prof. Xiaowei Zhang (IELM)
 			Prof. Jeffrey Xu Yu (Sys. Engg. & Engg. Mgmt.,
 					     CUHK)


**** ALL are Welcome ****