Latent Tree Models for Multivariate Density Estimation: Algorithms and Applications

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Latent Tree Models for Multivariate Density Estimation: Algorithms and 
Applications"

By

Mr. Yi Wang


Abstract

Multivariate density estimation is a fundamental problem in Applied 
Statistics and Machine Learning. Given a collection of data sampled from 
an unknown distribution, the task is to approximately reconstruct the 
generative distribution. There are two different approaches to the 
problem, the parametric approach and the non-parametric approach. In the 
parametric approach, the approximate distribution is represented by a 
model from a predetermined family.

In this thesis, we adopt the parametric approach and investigate the use 
of a model family called latent tree models for the task of density 
estimation. Latent tree models are tree-structured Bayesian networks in 
which leaf nodes represent observed variables, while internal nodes 
represent hidden variables. Such models can represent complex 
relationships among observed variables, and in the meantime, admit 
efficient inference among them.  Consequently, they are a desirable tool 
for density estimation.

While latent tree models are studied for the first time in this thesis for 
the purpose of density estimation, they have been investigated earlier for 
clustering and latent structure discovery. Several algorithms for learning 
latent tree models have been proposed. The state-of-the-art is an 
algorithm called EAST.  EAST determines model structures through 
principled and systematic search, and determines model parameters using 
the EM algorithm. It has been shown to be capable of achieving good 
trade-off between fit to data and model complexity. It is also capable of 
discovering latent structures behind data. Unfortunately, it has a high 
computational complexity, which limits its applicability to density 
estimation problems.

In this thesis, we propose two latent tree model learning algorithms 
specifically for density estimation. The two algorithms have distinct 
characteristics and are suitable for different applications. The first 
algorithm is called HCL. HCL assumes a predetermined bound on model 
complexity and restricts to binary model structures. It first builds a 
binary tree structure based on mutual information and then runs the EM 
algorithm once on the resulting structure to determine the parameters. As 
such, it is efficient and can deal with large applications.  The second 
algorithm is called Pyramid. Pyramid does not assume predetermined bounds 
on model complexity and does not restrict to binary tree structures. It 
builds model structures using heuristics based on mutual information and 
local search.  It is slower than HCL. However, it is faster than EAST and 
is only slightly inferior to EAST in terms of the quality of the resulting 
models.

In this thesis, we also study two applications of the density estimation 
techniques that we develop. The first application is to approximate 
probabilistic inference in Bayesian networks. A Bayesian network 
represents a joint distribution over a set of random variables. It often 
happens that the network structure is very complex and making inference 
directly on the network is computational intractable. We propose to 
approximate the joint distribution using a latent tree model and exploit 
the latent tree model for faster inference. The idea is to sample data 
from the Bayesian network, learn a latent tree model from the data 
offline, and when online, make inference with the latent tree model 
instead of the original Bayesian network.  HCL is used here because the 
sample size needs to be large to produce accurate approximation and it is 
possible to predetermine a bound on the online running. Empirical evidence 
shows that this method can achieve good approximation accuracy at low 
online computational cost.

The second application is classification.  A common approach to this task 
is to formulate it as a density estimation problem: One constructs the 
class-conditional density for each class and then uses the Bayes rule to 
make classification. We propose to estimate those class-conditional 
densities using either EAST or Pyramid. Empirical evidence shows that this 
method yields good classification performances.  Moreover, the latent tree 
models built for the class-conditional densities are often meaningful, 
which is conducive to user confidence. A comparison between EAST and 
Pyramid reveals that Pyramid is significantly more efficient than EAST, 
while it results in more or less the same classification performance as 
the latter.


Date:			Tuesday, 18 August 2009

Time:			2:00pm-4:00pm

Venue:			Room 3501
 			Lifts 25-26

Chairman:		Prof. Hai Yang (CIVL)

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Prof. Brian Mak
 			Prof. Dit-Yan Yeung
 			Prof. Ning Cai (IELM)
 			Prof. Marek Druzdzel (Inf. Sci.,
 					      Univ. of Pittsburgh)


**** ALL are Welcome ****