Sublinear Algorithms for Massive Data

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Sublinear Algorithms for Massive Data"

By

Mr. Di CHEN


Abstract

Sublinear algorithms address the rapid growth in data volume with a simple 
yet powerful premise, that useful tasks can be performed with even fewer 
resources than required to simply store the data.

This thesis studies randomized algorithms for massive data. We either 
devise new algorithms, or improve analysis on existing algorithms, 
resulting in meaningful theoretical guarantees with sublinear space or 
communication.

First, we present an algorithm that uses sublinear communication to 
perform set reconciliation under a 'noisy data' model, where two data 
points shall be considered 'the same' when the distance between them is 
small, modelling tiny perturbations caused in data due to some form of 
noise.

The second is a 1-pass streaming algorithm for estimating the number of 
distinct entities in the same noisy data model. Geometrically, this may be 
seen as counting sparsely placed clusters of small diameter, using space 
that is sublinear in the number of clusters.

Finally, we give an improved analysis of random Fourier features (RFF) for 
the Gaussian kernel, a popular data-oblivious embedding of kernel 
functions. In contrast to competing techniques that are typically 
data-dependent, RFF can be used under the data stream setting. Our work 
justifies the use of RFF in a variety of learning problems under the 
Gaussian kernel distance.


Date:			Friday, 9 December 2016

Time:			9:30am - 11:30am

Venue:			Room 2129C
  			Lift 19

Chairman:		Prof. Wai-Ho Mow (ECE)

Committee Members:	Prof. Mordecai Golin (Supervisor)
 			Prof. Ke Yi (Supervisor)
  			Prof. Sunil Arya
  			Prof. Raymond Wong
  			Prof. Lancelot James (ISOM)
  			Prof. Xiaoming Sun (Inst. of Comp. Tech., CAS)


**** ALL are Welcome ****