Learning Dynamic and Static Sparse Structures for Deep Neural Networks

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Learning Dynamic and Static Sparse Structures for Deep Neural
Networks"

By

Mr. Zhourong CHEN


Abstract

In the past decade, deep neural networks (DNNs) have produced superior 
results in a wide range of machine learning applications. However, the 
structures of these networks are usually dense and handcrafted by human 
experts. Learning sparse structures from data for DNNs still remains a 
challenging problem. In this thesis, we investigate learning two types of 
sparse structures for DNNs. The first are dynamic sparse structures which 
are conditioned on each individual input sample, while the second are 
static sparse structures which are learned from data and fixed the same 
for different input samples. Learning these sparse structures is expected 
to help ease overfitting, reduce time and space complexity, and improve 
the interpretability of deep models.

For learning dynamic sparse structures, the most essential problem is how 
to dynamically configure the network structure for each individual input 
sample on the fly. We propose a new framework called GaterNet for this 
problem in convolutional neural networks (CNNs). It is the first framework 
in the literature for learning dynamic sparse structures. GaterNet 
utilizes a dedicated sub-network to generate binary gates from input and 
prunes filters in a CNN for the specific input based on the gate values. 
It results in a dynamic CNN which essentially processes different samples 
with different sparse structures. Our experiments show that, with the help 
of this dynamic pruning, the generalization performance of the CNN can be 
significantly improved.

For learning static sparse structures, we propose two methods called Tree 
Skeleton Expansion (TSE) and Tree Receptive Field Growing (TRFG) 
respectively for standard feedforward neural networks (FNNs). Although 
many previous methods have been proposed for CNNs, little has been done 
for FNNs in the literature. There are usually applications where CNNs are 
not applicable and FNNs are the only choice among neural networks. In TSE, 
we assume that the data is generated from a multi-layer probabilistic 
graphical model (PGM). We construct a tree-structured PGM to model the 
data, use its structure as a skeleton, and expand the connections in the 
skeleton to form a deep sparse structure for FNNs. TSE is fast and the 
resulting sparse models can achieve comparable performance with much fewer 
parameters compared with dense FNNs. In TRFG, we are inspired by 
convolutional layers where each unit is connected to a group of 
strongly-correlated units in a spacial local region. As there are no such 
explicit spatial structures in general data, we propose to build a 
tree-structured PGM over the input units such that the strongly-correlated 
units are close to each other in the tree. Then we construct the next 
layer by introducing a unit for each local region in the PGM. The process 
can be repeated on each layer and lead to a deep sparse FNN. Experiments 
show that, TRFG can efficiently capture the salient correlations at 
different layers and learn sparse models which have better performance and 
interpretability than dense FNNs.


Date:			Tuesday, 20 August 2019

Time:			3:00pm - 5:00pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Chun-Man Chan (CIVL)

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Prof. James Kwok
 			Prof. Dit-Yan Yeung
 			Prof. Can Yang (MATH)
 			Prof. Pascal Poupart (University of Waterloo)


**** ALL are Welcome ****