Recurrent Poisson Process Unit for Automatic Speech Recognition

MPhil Thesis Defence


Title: "Recurrent Poisson Process Unit for Automatic Speech Recognition"

By

Mr. Hengguan HUANG


Abstract

Over the past few years, there has been a resurgence of interest in using 
recurrent neural network-hidden Markov model (RNN-HMM) for automatic 
speech recognition (ASR). Some modern recurrent network models, such as 
long short-term memory (LSTM) and simple recurrent unit (SRU), have 
demonstrated promising results on this task. Recently, several scientific 
perspectives in the fields of neuroethology and speech production suggest 
that human speech signals may be represented in discrete point patterns 
involving acoustic events in the speech signal. Based on this hypothesis, 
it may pose some challenges for RNN-HMM acoustic modeling: firstly, it 
arbitrarily discretizes the continuous input into the interval features at 
a fixed frame rate, which may introduce discretization errors; secondly, 
the occurrences of such acoustic events are unknown. Furthermore, the 
training targets of RNN-HMM are obtained from other (inferior) models, 
giving rise to misalignments.

On the other hand, the temporal point process is a powerful mathematical 
tool to describe the latent mechanisms governing the occurrences of 
observed random events. It is a random process whose realization consists 
of a sequence of isolated events with their time-stamps. Due to their 
generality, point processes have been widely used for modeling phenomena 
such as earthquakes, human activities, financial data, context-aware 
recommendations, etc. Major research in this area focuses on exploring the 
observed event data to model the underlying dynamics of the system, while 
our work attempts to deal with the situation where acoustic events are not 
available/observed even during training.

In this paper, we propose a recurrent Poisson process (RPP) which can be 
seen as a collection of Poisson processes at a series of time intervals in 
which the intensity evolves according to the RNN hidden states that encode 
the history of the acoustic signal. It aims at allocating the latent 
acoustic events in continuous time. Such events are efficiently drawn from 
the RPP using a sampling-free solution in an analytic form. The speech 
signal containing latent acoustic events is reconstructed/sampled 
dynamically from the discretized acoustic features using linear 
interpolation, in which the weight parameters are estimated from the onset 
of these events. The above processes are further integrated into an SRU, 
forming our final model, called recurrent Poisson process unit (RPPU). 
Experimental evaluations on ASR tasks including ChiME-2, WSJ0 and WSJ0&1 
demonstrate the effectiveness and benefits of the RPPU. For example, it 
achieves a relative WER reduction of 10.7% over state-of-the-art models on 
WSJ0.


Date:			Monday, 12 November 2018

Time:			10:00am - 12:00noon

Venue:			Room 5619
 			Lifts 31/32

Committee Members:	Dr. Brian Mak (Supervisor)
 			Prof. Dit-Yan Yeung (Chairperson)
 			Dr. Yangqiu Song


**** ALL are Welcome ****