Artificial Neural Network in Topic modeling and Language Modeling

PhD Qualifying Examination


Title: "Artificial Neural Network in Topic modeling and Language Modeling"

by

Mr. Wei LI


Abstract:

In recent years, artificial neural network model family has seen 
increasing popularity in natural language processing. In particular, with 
the development of deep neural network, many new models have been invented 
for topic modeling and language modeling. The advantage of the neural 
network models is that it can learn abstract representation of input 
features in the intermediate layer. It can represent relationship of 
various inputs in continuous space. These abstract patterns in the hidden 
layer not only contain the higher level linguistic information, but also 
reduce the dimensionality of the input features. The relationship 
information in the representation can also help reduce data sparsity. Thus 
multiple studies in natural language processing find the neural network 
model family achieves better performance comparing to traditional models 
with one-hot binary input.

Topic modeling and language modeling have close relationship both in 
theory and in practice. Generative topic models such as the latent 
Dirichlet allocation (LDA) describe the document-topic-word arrangement 
using latent variables. They can be seen as bag-of-words language models 
which neglect the word order. However, it is very likely that the word 
order information can be useful to topic modeling or text categorization 
tasks. It may represent the pattern of phrases or certain combination of 
the words, which appears regularly in particular topic settings. On the 
other hand, topic information can also contribute to language modeling. 
The simplest case is to add a topic vector as an extra feature to a neural 
network language model, so as to adapt the language model towards specific 
topics.

This article provides a survey over important methods of topic modeling 
and language modeling, putting emphasis on models related to neural 
networks. Moreover, it also includes a section on the distributed 
representation of words, which is also a related research with language 
modeling and topic modeling.


Date:			Thursday, 18 February 2016

Time:                  	2:30pm - 4:30pm

Venue:                  Room 4504
                         Lifts 25/26

Committee Members:	Dr. Brian Mak (Supervisor)
 			Prof. Nevin Zhang (Chairperson)
 			Dr. Raymond Wong
 			Prof. Dit-Yan Yeung


**** ALL are Welcome ****