Latent Tree Models: An Application and an Extension

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Latent Tree Models: An Application and an Extension"

By

Mr. Kin-Man Poon


Abstract

Latent tree models are a class of probabilistic graphical models.  These 
models have a tree structure, in which the internal nodes represent latent 
variables whereas the leaf nodes represent manifest variables.  They allow 
fast inference and have been used for multidimensional clustering, latent 
structure discovery, density estimation, and classification.

This thesis makes two contributions to the research of latent tree models. 
The first contribution is a new application of latent tree models in 
spectral clustering.  In spectral clustering, one defines a similarity 
matrix for a collection of data points, transforms the matrix to get the 
so-called Laplacian matrix, finds the eigenvectors of the Laplacian 
matrix, and obtains a partition of the data points using the leading 
eigenvectors. The last step is sometimes referred to as rounding, where 
one needs to decide how many leading eigenvectors to use, to determine the 
number of clusters, and to partition the data points.

We propose to use latent tree models for rounding.  The method differs 
from previous rounding methods in three ways.  First, we relax the 
assumption that the number of clusters equals the number of eigenvectors 
used. Second, when deciding how many leading eigenvectors to use, we not 
only rely on information contained in the leading eigenvectors themselves, 
but also make use of the subsequent eigenvectors.  Third, our method is 
model-based and solves all the three subproblems of rounding using latent 
tree models.  We evaluate our method on both synthetic and real-world 
data.  The results show that our method works correctly in the ideal case 
where between-clusters similarity is 0, and degrades gracefully as one 
moves away from the ideal case.

The second contribution is an extension to latent tree models.  While 
latent tree models have been shown to be useful in data analysis, they 
contain only discrete variables and are thus limited to discrete data. 
One has to resort to discretization if analysis on continuous data is 
needed.  However, this leads to loss of information and may make the 
resulting models harder to interpret.

We propose an extended class of models, called pouch latent tree models, 
that allow the leaf nodes to represent continuous variables.  This 
extension allows the models to work on continuous data directly.  These 
models also generalize Gaussian mixture models, which are commonly used 
for model-based clustering on continuous data.  We use pouch latent tree 
models for facilitating variable selection in clustering, and demonstrate 
on some benchmark data that it is more appropriate to facilitate variable 
selection than to perform variable selection traditionally.  We further 
demonstrate the usefulness of the models by performing multidimensional 
clustering on some real-world basketball data.  Our results exhibit 
multiple meaningful clusterings and interesting relationships between the 
clusterings.


Date:			Wednesday, 1 August 2012

Time:			2:00pm – 4:00pm

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Andrew Poon (ECE)

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Prof. Brian Mak
 			Prof. Dit-Yan Yeung
 			Prof. Bingyi Jing (MATH)
                       	Prof. Manfred Jaeger (Aalborg Univ.)


**** ALL are Welcome ****