Automated Document Indexing using Topic Hierarchies from HLTA

MPhil Thesis Defence


Title: "Automated Document Indexing using Topic Hierarchies from HLTA"

By

Mr. Chun Fai LEUNG


Abstract

The Hierarchical Latent Tree Analysis (HLTA) is a recently proposed 
algorithm for hierarchical topic detection. It takes a collection of 
unlabeled and unstructured text documents as input and outputs a hierarchy 
of topics where each topic is a subset of documents. In this thesis, we 
present an automated document indexing system that automatically builds an 
index structure for a corpus of documents using the topic hierarchy 
obtained by HLTA. It also provides tools for visualizing various facts and 
relationships that can be extracted from the topic hierarchy. We 
demonstrate the usefulness of the system on three datasets: (1) a 
collection of research papers published at major AI conferences and 
journals from 2000 to 2018, (2) two collections of research outputs from 
researchers at the Hong Kong University of Science and Technology, and (3) 
a collection of Chinese web posts related to migration posted on the 
social media and e-commerce platform that was collected by the Internal 
Organization for Migration.


Date:			Monday, 20 August 2018

Time:			2:00pm - 4:00pm

Venue:			Room 3494
 			Lifts 25/26

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Dr. Raymond Wong (Chairperson)
 			Prof. Dik-Lun Lee


**** ALL are Welcome ****