PhD Qualifying Examination


"Word sense disambiguation vs. statistical machine translation"

Miss Marine Carpuat

Abstract:

In this survey, we review word sense disambiguation (WSD) and
statistical machine translation (SMT) literature in light of the
recent WSD vs. SMT debate.

WSD, the task of resolving sense ambiguity to identify the right
translation of a word is one of the major challenges faced by language
translation systems. If the English word "drug" translates into French
as either "drogue" (used as a narcotic) or "medicament" (used as a
medicine), then an English-French MT system needs to disambiguate
every use of "drug" in order to make the correct translations.

Heavy effort has been put in designing and evaluating dedicated WSD
models, in particular with the Senseval series of workshops. This is
partly motivated by the often unstated assumption that any full
translation system, to achieve full performance, will sooner or later
have to incorporate individual WSD components.

However, in most machine translation architectures, in particular SMT,
the WSD problem is typically not explicitly addressed, but the
translation engine already implicitly factors in many contextual
features into lexical choice.

In this context, an energetically debated question at conferences over
the past year is whether even the new state-of-the-art WSD models
actually have anything to offer to full scale SMT systems.

We will show that dedicated WSD has led to several useful insights for
SMT, and present how typical SMT models perform WSD. Finally, we will
discuss the main challenges for the integration of state-of-the-art
dedicated WSD models in current SMT architectures.


Date:     		Wednesday, 21 September 2005

Time:                   12:00noon-2:00p.m.

Venue:                  Room 4480
			lifts 25-26

Committee Members:      Dr. Dekai Wu (Supervisor)
			Dr. Brian Mak (Chairperson)
			Dr. Dit-Yan Yeung
			Dr. Pascale Fung (ELEC)


**** ALL are Welcome ****