Understanding Transformer in Natural Language Processing

PhD Thesis Proposal Defence


Title: "Understanding Transformer in Natural Language Processing"

by

Mr. Han SHI


Abstract:

Transformer-based models are popularly used in natural language processing 
(NLP) and have shown significant performance on various downstream tasks, such 
as text classification, text translation, question answering and text 
generation. Even though Transformer-based models have achieved great success in 
a lot of fields, few works delve into Transformer deeper.

In this proposal, we attempt to analyse and understand Transformer architecture 
from three different perspectives. Firstly, we focus on the self-attention 
module in Transformer and propose a differentiable architecture search method 
to find important attention patterns. Different from prior works, we find that 
diagonal elements in the attention map can be dropped without harming the 
performance. To understand this observation, we provide a theoretical proof 
from the perspective of universal approximation. Furthermore, we achieve a 
series of attention masks for efficient architecture design based on our 
proposed search method.

Secondly, we attempt to understand the feed-forward module in Transformer from 
a unified framework. Specifically, we introduce the concept of memory token and 
build the relationship between feed-forward and self-attention. Moreover, we 
propose a novel architecture named uni-attention, which contains all four types 
of attention connection in our framework. Uni-attention achieves better 
performance compared with previous baselines given the same number of memory 
tokens.

Finally, we investigate the over-smoothing phenomenon in whole Transformer 
architecture. We provide a theoretical analysis by building the relationship 
between the self-attention and the graph field. Specifically, we find that 
layer normalization plays a important role in the over-smoothing problem and 
verify this empirically. To alleviate this issue, we propose hierarchical 
fusion architectures such that the output can be more diverse.


Date:			Friday, 22 July 2022

Time:                  	3:00pm - 5:00pm

Zoom Meeting:		https://hkust.zoom.us/j/5599077828

Committee Members:	Prof. James Kwok (Supervisor)
 			Dr. Brian Mak (Chairperson)
 			Dr. Hao Chen
 			Dr. Minhao Cheng


**** ALL are Welcome ****