Binarizing Synchronous Grammars for Machine Translation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
Department of Electronic and Computer Engineering
Human Language Technology Center

		SEMINAR

Title: "Binarizing Synchronous Grammars for Machine Translation"

Speaker: Liang HUANG, University of Pennsylvania

Date: 30 July 2007
Time: 16:00-17:30
Venue: Room 1505 (Lifts 25-26)

Abstract:

Joint work with Hao Zhang (Rochester), Dan Gildea (Rochester), and
Kevin Knight (USC/ISI)

Syntax-based MT systems have seen promising improvements in
translation quality, but are often computationally too expensive to
be practical. In this talk, I will address this problem of efficiency
in systems based on synchronous grammars and tree transducers. In
this model, the theoretical complexity of decoding is exponential in
the size of individual grammar rules due to arbitrary re-orderings
between the two languages. To alleviate this problem, we develop a
theory of binarization for synchronous grammars which factors big
rules into smaller, binary rules, that permit efficient decoding and
are in fact equivalent to Inversion Transduction Grammars (ITG). We
also present a linear-time algorithm for synchronous binarization. In
our large-scale experiments, we found that the syntactic re-orderings
between Chinese and English are almost always binarizable and the
resulting rule set significantly improves the speed and accuracy of
ISI's syntax-based MT system, which is currently the state-of-the-art.

Besides efficiency, this novel technique of binarization also
provides a tool to study the formal and empirical properties of
synchronous grammars. For example, we are now able to address the
famous question by Dekai Wu that whether there are non-ITG re-
orderings between fixed-word-order languages such as English and
Chinese. Our empirical results on hand-aligned English-Chinese
parallel text confirmed that the pro-ITG conjecture is largely
correct (99.7% cases are binarizable), while we also provide, for the
first-time, "real-examples" of non-binarizable reordering verified by
native speakers. If time permitting, I would also present results on
freer-word-order languages.

References:

* Hao Zhang, Liang Huang, Daniel Gildea, and Kevin Knight (2006).
  Synchronous Binarization for Machine Translation. HLT-NAACL 2006, NY.

* Liang Huang, Hao Zhang, Daniel Gildea, and Kevin Knight (2007).
  Binarization of Synchronous Context-Free Grammars. Submitted to
  Computational Linguistics.

Biography:

Liang Huang is finishing his fourth year in the PhD program at the
University of Pennsylvania, under Aravind Joshi. He is mainly
interested in the theoretical aspects of computational linguistics,
in particular, efficient algorithms in parsing and machine
translation, and formal properties of synchronous grammars. He also
works on applying computational linguistics to structural biology.

***All are welcome***