Learning Static and Dynamic Sparse Structures for Deep Neural Networks

PhD Thesis Proposal Defence


Title: "Learning Static and Dynamic Sparse Structures for Deep Neural Networks"

by

Mr. Zhourong CHEN


Abstract:

In the past decade, deep neural networks (DNNs) have produced superior results 
in a wide range of machine learning applications. However, the structures of 
these networks are usually dense and handcrafted by human experts. Learning 
sparse structures from data for DNNs still remains a challenging problem in the 
literature. In this thesis, we investigate learning two types of sparse 
structures for DNNs. The first are static sparse structures which are learned 
from data and fixed for different input samples, while the second are dynamic 
sparse structures which are conditioned on individual input sample. Learning 
these sparse structures is expected to help ease overfitting, reduce time and 
space complexity, and lead to improved interpretability of deep models.

For learning static sparse structures, we propose two methods called Tree 
Skeleton Expansion (TSE) and Tree Receptive Field Growing (TRFG) respectively 
for standard feedforward neural networks (FNNs). Both methods rely on learning 
probabilistic graphical models (PGMs) for identifying groups of strongly 
correlated units and focus on modeling the strong correlations. In TSE, we 
construct a tree-structured PGM as a skeleton and expand the connections in the 
skeleton to form a deep sparse structure for FNNs. TSE is fast and the 
resulting sparse models can achieve better performance with much fewer 
parameters compared with dense FNNs. In TRFG, we propose to learn deep 
structures in a layer-wise manner. For each layer of units, we build a 
tree-structured PGM and construct the next layer by introducing a unit for each 
local region in the PGM. TRFG can efficiently capture the salient correlations 
at different layers and learn sparse models which have better performance and 
interpretability than dense FNNs.

For learning dynamic sparse structures, the most essential problem is how to 
dynamically configure the network structure for each individual input sample on 
the fly. We propose a new framework called GaterNet for this problem in 
convolutional neural networks (CNNs). GaterNet utilizes a dedicated sub-network 
to generate binary gates from input and prunes filters in a CNN for the 
specific input based on the gate values. It results in a dynamic CNN which 
essentially processes different samples with different sparse structures. Our 
preliminary experiments show that, with the help of this dynamic pruning, the 
generalization performance of the CNN can be significantly improved.


Date:			Friday, 10 May 2019

Time:                  	10:00am - 12:00noon

Venue:                  Room 5510
                         lifts 25/26

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Dr. Raymond Wong (Chairperson)
 			Dr. Yangqiu Song
 			Prof. Dit-Yan Yeung


**** ALL are Welcome ****