Techniques and Applications of Random Sampling on Massive Data

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Techniques and Applications of Random Sampling on Massive Data"

By

Mr. Lu WANG


Abstract

Living in the era of big data, we often need to process and analyze data 
sets that have never been so large and fast-growing. Random sampling has 
thus received much attention as an effective tool for turning big data 
"small".  It allows us to significantly reduce the size of input while 
maintaining the main features of the original data set we need. It is also 
easy to trade off between the computation complexity and the accuracy of 
the result, by tweaking the sample size.

Although random sampling is a classical problem with a long history, it 
has received revived attention lately motivated by new applications as 
well as new constraints in the big data era. This thesis presents several 
new techniques and applications of random sampling:  (1) a new randomized 
streaming algorithm for finding approximate quantiles in a data stream, 
which achieves the smallest space complexity of all such algorithms; (2) 
an augmented B-tree index that, for any given range query, returns a 
sampling-based summary containing the quantiles and heavy hitters of all 
tuples in the query range; (3) an sample-augmented R-tree that, given any 
range query, returns random samples from the query range in an online 
fashion. Apart from the description and analysis of each algorithm we 
propose, experimental results are also provided, confirming the advantages 
of the new algorithms. Finally, we showcase a system for large-scale 
spatio-temporal data analysis using the developed techniques.


Date:			Thursday, 21 May 2015

Time:			10:00am - 12:00noon

Venue:			Room 3598
 			Lifts 27/28

Chairman:		Prof. Andrew Poon (ECE)

Committee Members:	Prof. Ke Yi (Supervisor)
 			Prof. Siu-Wing Cheng
 			Prof. Dimitris Papadias
 			Prof. Xinghua Zheng (ISOM)
 			Prof. Guoliang Li (Tsinghua Univ.)


**** ALL are Welcome ****