On Generalization and Implicit Bias of Gradient Methods in Deep Learning

Speaker:	Prof. Jian Li
		Institute for Interdisciplinary Information Sciences
		Tsinghua University

Title:		"On Generalization and Implicit Bias of Gradient Methods
		 in Deep Learning"

Date:		Monday 12 Aug 2019

Time:		2:00pm - 3:00pm

Venue:		Room 3598 (via lift no. 27/28), HKUST

Abstract:

Deep learning has enjoyed huge empirical success in recent years.
Although training a deep neural network is a highly non-convex optimization
problem, simple (stochastic) gradient methods are able to produce good 
solutions that minimize the training error, and more surprisingly,  can 
generalize well to out-of sample data, even when the number of parameters 
is significantly larger than the amount of training data. It is known that 
changing the optimization algorithm, even without changing the model, 
changes the implicit bias, and also the generalization properties. What 
is the bias introduced by the optimization algorithms for neural networks?
What ensures generalization in neural networks? In this talk, we attempt to 
answer the above questions by proving new generalization bounds and 
investigating the implicit bias of various gradient methods.

(1) We develop a new framework, termed Bayes-Stability, for proving 
algorithm-dependent generalization error bounds. Using the new framework, 
we obtain new data-dependent generalization bounds for stochastic gradient 
Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., 
with momentum, mini-batch and acceleration, Entropy-SGD). Our result 
recovers (and is typically tighter than) a recent result in Mou et al.
(2018) and improves upon the results in Pensia et al. (2018). Our 
experiments demonstrate that our data-dependent bounds can distinguish 
randomly labelled data from normal data, which provides an explanation 
to the intriguing phenomena observed in Zhang et al. (2017a).

(2) We show gradient descent converges to the max-margin direction for 
homogeneous neural networks, including fully-connected and convolutional 
neural networks with ReLU or LeakyReLU activations, generalizing previous 
work for logistic regression with one-layer or multi-layer linear networks. 
Finally, as margin is closely related to robustness, we discuss potential 
benefits of training longer for improving the robustness of the model.

******************
Biography:

Jian Li is currently an associate professor at 
Institute for Interdisciplinary Information Sciences (IIIS, previously ITCS), 
Tsinghua University, headed by Prof. Andrew Yao. He got his BSc degree from 
Sun Yat-sen (Zhongshan) University, China, MSc degree in computer science 
from Fudan University, China and PhD degree in the University of Maryland, 
USA.
His major research interests lie in algorithm design and analysis, machine 
learning, and databases. He co-authored several research papers that have 
been published in major computer science conferences and journals. He 
received the best paper awards at VLDB 2009 and ESA 2010. He is also a 
recipient of the "221 Basic Research Plan for Young Faculties"
at Tsinghua University and the "new century excellent talents award" by 
Ministry of Education of China.