Constructing Synopses for Query Answering

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Constructing Synopses for Query Answering"

By

Mr. Yuan QIU


Abstract

Synopses, also known as summaries or sketches, are data structures computed 
from a dataset to support analytical queries. Traditionally, the purpose of 
using synopses is efficiency. By allocating some extra space, queries can be 
answered approximately but efficiently by a precomputed synopsis without 
referencing the original dataset. Recently, privacy has been a new concern 
besides efficiency. As the current standard for privacy analysis, differential 
privacy (DP) enjoys the property that post-processing a DP mechanism brings no 
extra privacy loss. This is in line with the principle of synopses: by 
constructing a differentially private synopsis from the dataset, many queries 
can be answered accurately while satisfying a predefined privacy constraint at 
construction time.

In this thesis, we focus on three problems related to synopses construction We 
start with the cardinality estimation problem for Select-Project-Join queries 
under the non-private setting. We propose a synopsis constructed through 
weighted distinct sampling according to near-optimal sampling rates, and show 
an efficient way of constructing the it. We demonstrate its optimality through 
both theoretical and empirical evaluation. We then consider numerical queries 
on static datasets under differential privacy, which capture Select-Aggregate 
queries in database systems. A private synopsis is designed to achieve both 
query-specific and instance-specific error, which outperforms existing 
solutions both in theory and in practice. Finally, We extend the setting to 
streaming data. A private synopsis is designed for linear queries, which are 
generalizations of Select queries in databases. We show our synopsis for 
fully-dynamic streams has asymptotically the same error as answering queries in 
static settings, ignoring polylogarithmic factors in the length of the stream. 
The results can also be extended to arbitrary union-preserving queries.


Date:			Wednesday, 10 August 2022

Time:			10:00am - 12:00noon

Zoom Meeting:		https://hkust.zoom.us/j/5688510375

Chairperson:		Prof. Juanyi XU (ECON)

Committee Members:	Prof. Ke YI (Supervisor)
 			Prof. Siu Wing CHENG
 			Prof. Mordecai GOLIN
 			Prof. Yi YANG (ISOM)
 			Prof. Hubert CHAN (HKU)


**** ALL are Welcome ****