Large Scale Optimization Methods for Machine Learning

PhD Thesis Proposal Defence


Title: "Large Scale Optimization Methods for Machine Learning"

by

Mr. Shuai ZHENG


Abstract:

Dealing with large-scale dataset has been a big challenge for optimization 
method in machine learning. Typical machine learning problems can be cast as a 
minimization of an objective over some underlying data distribution. To exploit 
the data structure and improve the generalization performance, some complex 
regularization may be added to the objective. The stochastic gradient descent 
(SGD) method has been widely viewed as an ideal approach for large-scale 
machine learning problems while the conventional batch gradient method 
typically falters. Despite its flexibility and scalability, the stochastic 
gradient is associated with high variance which impedes training. This thesis 
proposal provides a number of new optimization algorithms for tackling the 
large-scale machine learning tasks.

We firstly propose a fast and scalable stochastic ADMM method for solving 
empirical risk minimization problem with complex nonsmooth regularizers such as 
graph lasso and group lasso, and a stochastic continuation method to optimize 
convex problems where loss and regularizer are both nonsmooth. While the 
existing approaches rely crucially on the assumption that the dataset is 
finite, we introduce two SGD-like algorithms for the finite sums with infinite 
data. The proposed algorithms outperform existing methods in terms of both 
iteration complexity and storage. Inspired by the recent advancement on 
adaptive gradient methods for training deep neural networks, we present a fast 
and powerful optimization algorithm based on the 
follow-the-proximally-regularized-leader (FTPRL) method. The new algorithm 
significantly outperforms the existing approaches, thereby advancing the 
state-of-the-art results. Recently, there is growing interest in distributed 
training as it can still be difficult to store a very large dataset on a single 
machine. In the light of this, we develop a distributed asynchronous 
gradient-based method, improving upon the existing distributed machine learning 
algorithms and enjoying fast linear convergence rate. Finally, the scalability 
of large-scale distributed training of neural networks is often limited by the 
communication overhead. Motivated by the recent advances in optimization with 
compressed gradient, we propose a communication-efficient distributed SGD with 
error-feedback. The proposed method provably converges to a stationary point at 
the same asymptotic rate as distributed synchronous SGD.


Date:			Monday, 1 April 2019

Time:                  	2:00pm - 4:00pm

Venue:                  Room 5508
                         (lifts 25/26)

Committee Members:	Prof. James Kwok (Supervisor)
 			Prof. Dit-Yan Yeung (Chairperson)
 			Prof. Daniel Palomar (ECE)
 			Prof. Tong Zhang (MATH)

**** ALL are Welcome ****