Conditional Random Field Autoencoders for Feature-Rich, Unsupervised NLP

======================================================================
                Joint Seminar
======================================================================
The Hong Kong University of Science & Technology
Human Language Technology Center
Department of Computer Science and Engineering
Department of Electronic and Computer Engineering
---------------------------------------------------------------------
Speaker:        Prof. Chris DYER
                Carnegie Mellon University

Title:          "Conditional Random Field Autoencoders for Feature-Rich,
                 Unsupervised NLP"

Date:           Friday, 7 November, 2014

Time:           11:00am - 12 noon

Venue:          Lecture Theater G (near lifts 25/26), HKUST

Abstract:

Human language is the result of cognitive processes whose contours
are---at best---incompletely understood. Given the incomplete information
we have about the processes involved, the frequently disappointing results
obtained from attempts to use unsupervised learning to uncover latent
linguistic structures (e.g., part-of-speech sequences, syntax trees, or
word alignments in parallel data) can be attributed---in large part---to
model misspecification.

This work introduces a novel framework for unsupervised learning of
structured predictors with overlapping, global features. Each input's
latent representation is predicted conditional on the observable data
using a feature-rich conditional random field. Then a reconstruction of
the input is generated, conditional on the latent structure, as drawn from
cheaply-estimated multinomials. The autoencoder structure enables
efficient inference without unrealistic independence assumptions, enabling
us to incorporate the often conflicting, overlapping theories (in the form
of hand-crafted features) about how latent structures relate to observed
data in a coherent model. We contrast our approach with traditional joint
unsupervised models that are learned to maximize the marginal likelihood
of observed data. We show state-of-the-art results with instantiations of
the model for two canonical NLP tasks: part-of-speech induction and bitext
word alignment, and show that training our model is substantially more
efficient than training feature-rich models.

This is joint work with Waleed Ammar and Noah A. Smith.


********************
Biography:

Dr. Chris DYER is an Assistant Professor of Language Technologies in the
School of Computer Science (SCS) at Carnegie Mellon University. He also
holds an affiliated appointment in the Machine Learning Department at the
same institution. Dr. DYER received his Ph.D. in Linguistics from the
University of Maryland at College Park in December 2010, where his
research with Prof. Philip Resnik developed tractable statistical models
of machine translation that are robust to errors in automatic linguistic
analysis components by simultaneously considering billions of multiple
analyses during translation, and deferring analysis of uncertain inputs
until the complete translation pipeline completes. Tractability relied on
using automata theoretic insights that previously had their primary
application in compiler design. The techniques he developed have been
widely adopted in other research labs and in commercial translation
applications. The translation and learning software developed during Dr.
DYER's thesis work is publicly available and has been used in courses in
natural language processing and machine translation at a several
institutions, in three Ph.D. dissertations, and in numerous publications
by other authors. In addition to numerous scientific articles and one
patent, Dr. DYER co-authored a book with Dr. Jimmy Lin, Data Intensive
Text-Processing with MapReduce, published by Morgan & Claypool. Since its
publication, the book has been widely used in courses around the world.
Following his graduate work, Dr. DYER was a post-doctorial associate at
Carnegie Mellon with Dr. Noah Smith. Their work developed probabilistic
models of natural language that incorporate structured prior linguistic
knowledge to achieve better predictive performance than uninformative
priors alone produced, in particular in low-resource languages.