PhD Thesis Proposal Defence "Word sense disambiguation vs. statistical machine translation" By Miss Marine Carpuat Abstract: We propose to empirically demonstrate that dedicated word sense disambiguation (WSD) systems are useful to statistical machine translation (SMT), and directly investigate the related issues raised by the WSD vs. SMT debate. WSD, the task of resolving sense ambiguity to identify the right translation of a word is one of the major challenges faced by language translation systems. If the English word "drug" translates into French as either "drogue" (used as a narcotic) or "medicament" (used as a medicine), then an English-French machine translation system needs to disambiguate every use of "drug" in order to make the correct translations. Heavy effort has been put in designing and evaluating dedicated WSD models, in particular with the Senseval series of workshops. This is partly motivated by the often unstated assumption that any full translation system, to achieve full performance, will sooner or later have to incorporate individual WSD components. However, in most machine translation architectures, in particular SMT, the WSD problem is typically not explicitly addressed. This paradoxical situation encouraged speculation that SMT models are already very good at WSD and that current WSD systems have nothing to offer to state-of-the-art SMT. We propose to directly address these issues by conducting an empirical investigation of the WSD vs. SMT debate. A critical survey of both WSD and SMT literature shows that current SMT systems already benefit from some WSD insights. But it is still unclear whether the new state-of-the-art WSD models can actually help improve translation quality. We will first introduce the HKUST WSD system, which achieves the best known performance on the Senseval-3 Chinese lexical sample task, among other desirable properties for our study. Then we will present empirical results suggesting that while typical SMT models cannot disambiguate word translations as well as dedicated WSD systems, simple methods for incorporating WSD predictions do not help translation quality. Based on error analysis, we will suggest new directions to incorporate WSD predictions in SMT. Date: Monday, 21 November 2005 Time: 1:00p.m.-3:00p.m. Venue: Room 4480 lifts 25-26 Committee Members: Dr. Dekai Wu (Supervisor) Dr. Dit-Yan Yeung (Chairperson) Dr. Brian Mak Dr. Pascale Fung (ELEC) **** ALL are Welcome ****