Cross-Column Redundancy: Concept, Detection and Application

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Cross-Column Redundancy: Concept, Detection and Application"

By

Mr. Hao LIU


Abstract

Nowadays, more and more data from heterogeneous data sources are integrated 
into various data warehouse systems for analytical purposes. As a result, data 
columns in such systems often exhibit redundancy, which we term Cross-Column 
Redundancy (CCR). CCR indicates high similarity or correlation between columns 
and therefore can be exploited for data management and business intelligence. 
However, due to the combinatoric nature of CCR, it is computationally 
challenging to automatically detect CCR.

In this thesis, we define three kinds of CCR, develop efficient algorithms for 
CCR detection, and leverage CCR to compress data. In particular, we focus on a 
kind of CCR, called Soft Concatenation Mapping (SCM), where one column can be 
derived from another or several other columns by transformation and 
concatenation. We prove that SCM detection is NP-hard and propose approximate 
algorithms. Furthermore, we leverage CCR for database compression and develop 
Cuttle, a column storage system that integrates our cross-column compression 
schemes into existing database systems transparently. Our experiments on 
real-world datasets show that Cuttle reduces the data storage by half and 
improves the query processing performance by 20%. In addition, we present the 
design and implementation of UStore, a customized version of Cuttle tailored 
for UnionPay. We use UnionPay?s inter-bank transaction settlement platform 
(ITSP) as a running example to illustrate the core components of UStore. To 
date, UStore has been deployed to process over 15 years? bankcard transaction 
data (over 3PB in plain text format) in UnionPay.


Date:			Monday, 24 July 2017

Time:			5:00pm - 7:00pm

Venue:			Room 2130B
 			Lifts 19

Chairman:		Prof. Guanghao Chen (CIVL)

Committee Members:	Prof. Lionel Ni (Supervisor)
 			Prof. Qiong Luo (Supervisor)
 			Prof. Shing-Chi Cheung
 			Prof. Lei Chen
 			Prof. Jingshen Wu (MAE)
 			Prof. Qing Li (Comp. Sci., CityU)


**** ALL are Welcome ****