Network Compression via Loss-aware Quantization

PhD Thesis Proposal Defence


Title: "Network Compression via Loss-aware Quantization"

by

Miss Lu HOU


Abstract:

Deep neural network models, though very powerful and highly successful, are 
computationally expensive in terms of space and time. Recently, there have been 
a number of attempts on quantizing the network weights and activations. These 
attempts greatly reduce the network size, and allows possibility of deploying 
deep models in resource-constrained environments, such as small computing 
devices. However, most existing quantization schemes are based on simple matrix 
approximations and ignore the effect of quantization on the loss.

In this thesis, we first propose to directly minimize the loss w.r.t. the 
quantized weights. The optimization problem can be solved by a proximal Newton 
algorithm with diagonal Hessian approximated by the second moments already 
computed by the RMSProp or Adam optimizer.

We show that for binarization, the underlying proximal step has an 
efficient closed- form solution. Experiments on both feedforward and 
recurrent networks show that the proposed loss-aware binarization 
algorithm outperforms existing binarization schemes. Since binarization 
often causes accuracy degradation on large models, we then extend the 
loss-aware weight binarization scheme to ternarization and m-bit (where m 
> 2) quantization. Experiments on both feedforward and recurrent neural 
networks show that the proposed scheme outperforms state-of-the-art weight 
quantization algorithms, and is as accurate (or even more accurate) than 
the full-precision network.

Though weight-quantized models have small storage and fast inference, the 
training can still be time-consuming. This can be improved with distributed 
learning. To reduce the high communication cost due to gradient 
synchronization, recently gradient quantization has also been proposed to train 
deep networks with full-precision weights. Thus we finally theoretically study 
how the combination of both weight and gradient quantization affects 
convergence. Empirical experiments confirm the theoretical convergence results, 
and demonstrate that quantized networks can speed up training and have 
comparable performance as full-precision networks.


Date:			Thursday, 25 April 2019

Time:                  	4:00pm - 6:00pm

Venue:                  Room 4475
                         (lifts 25/26)

Committee Members:	Prof. James Kwok (Supervisor)
 			Dr. Brian Mak (Chairperson)
 			Dr. Wei Wang
 			Prof. Tong Zhang (MATH)


**** ALL are Welcome ****