HKUST's ideal research environment is situated in beautiful Clear Water Bay, Hong Kong.
photo

LO, Chi-kiu 羅致翹 (Jackie)

PhD candidate of Department of Computer Science and Engineering, HKUST

Human Language Technology Center
Department of Computer Science
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Kowloon, Hong Kong
lab +852 2358-8831 room 2580 (lifts 27/28 and 29/30)
jackielo (at) cs (dot) ust (dot) hk http://www.cs.ust.hk/~jackielo/

Chi-kiu, LO received her Master of Philosophy (MPhil) in Computer Science and Engineering in 2009 and Bachelor of Engineering (BEng) in Computer Science (minor in Social Science) in 2004 from the Hong Kong University of Science and Technology (HKUST). She is currently pursuing her Doctor of Philosophy (PhD) degree in the Department of Computer Science and Engineering, HKUST under supervision of Prof. Dekai, Wu. Her research interest is statisitcal natural language processing with particular focuses on machine translation and semantic role labeling. She is experienced in instructing computer science courses of different levels, rangeing from computer literacy for non-CS undergraduates, fundamental programming for non-CS undergraduates, advanced programming for junior CS undergraduates, artificial intelligence and natural language processing for senior CS undergraduates, knowledge management for IT taught postgraduates and natural language processing for CS research postgraduates.

Research interests

Semantic Machine Translation Evaluation; machine translation; semantic analysis of text; statistical natural language processing; text mining; machine learning and data mining; customer relation management (CRM).

Publications

  • Chi-kiu LO and Dekai WU. "A Radically Simple, Effective Annotation and Alignment Methodology for Semantic Frame Based SMT and MT Evaluation". OpenMT-2 Workshop on Using Linguistic Information for Hybrid Machine Translation (LiHMT 2011). Barcelona, Spain: November 2011.

    We introduce a radically simple yet effective methodology for annotating and aligning semantic frames inexpensively using untrained lay annotators that is ideally suited for practical semantic SMT and evaluation applications. For example, recent work by Lo and Wu (2011) introduced MEANT and HMEANT, which are state-of-the-art metrics that evaluates translation meaning preservation via Propbank style of semantic frames. For such applications, however, we argue that the Propbank annotation are too complex and detailed, since they are aimed at training linguists to annotate semantic frames with gold standard accuracy. Instead, we believe that annotating semantic frames for such purposes should be as intuitive as understanding the basic event structure of a sentence, which any untrained human does effortlessly. We propose a simplified set of annotation guidelines consisting of half a page plus three annotated examples. Together with a graphical user interface designed to facilitate the annotation and comparison process by guiding untrained humans step by step, only 5 to 15 minutes are needed to train lay annotators. This allows the lay annotators to focus on understanding the translation to provide consistent and efficient annotation and comparison. The methodology is 'cloud' based to be truly platform independent, installation-free and portable.

  • Simon SHI, Pascale FUNG, Emmanuel PROCHASSON, Chi-kiu LO and Dekai WU. "Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web". Fifth International Joint Conference on Natural Language Processing (IJCNLP 2011). Chiang Mai, Thailand: November 2011.

  • Markus SAERS, Dekai WU, Chi-kiu LO and Karteek ADDANKI. "Speech Translation with Grammer Driven Probabilistic Phrasal Bilexica Extraction". Twelfth Annual Conference of the International Speech Communication Association (Interspeech 2011). Florence, Italy: August 2011.

  • Chi-kiu LO and Dekai WU. "SMT vs. AI redux: How semantic frames evaluate MT more accurately". Twenty-second International Joint Conference on Artificial Intelligence (IJCAI 2011). Barcelona, Spain: July 2011.

    We argue for an alternative paradigm in evaluating machine translation quality that is strongly empirical but more accurately reflects the utility of translations, by returning to a representational foundation based on AI oriented lexical semantics, rather than the superficial flat n-gram and string representations recently dominating the field. Driven by such metrics as BLEU and WER, current SMT frequently produces unusable translations where the semantic event structure is mistranslated: who did what to whom, when, where, why, and how? We argue that it is time for a new generation of more "intelligent" automatic and semi-automatic metrics, based clearly on getting the structure right at the lexical semantics level. We show empirically that it is possible to use simple PropBank style semantic frame representations to surpass all currently widespread metrics' correlation to human adequacy judgments, including even HTER. We combine the best of both worlds: from an SMT perspective, we provide superior yet low-cost quantitative objective functions for translation quality; and yet from an AI perspective, we regain the representational transparency and clear reflection of semantic utility of structural frame-based knowledge representations.

  • Chi-kiu LO and Dekai WU. "Structured vs. Flat Semantic Role Representations for Machine Translation Evaluation". Fifth Workshop on Syntax and Structure in Statistical Translation (SSST-5). Portland, Oregon, US: June 2011.

    We argue that failing to capture the degree of contribution of each semantic frame in a sentence explains puzzling results in recent work on the MEANT family of semantic MT evaluation metrics, which have disturbingly indicated that dissociating semantic roles and fillers from their predicates actually improves correlation with human adequacy judgments even though, intuitively, properly segregating event frames should more accurately reflect the preservation of meaning. Our analysis finds that both properly structured and flattened representations fail to adequately account for the contribution of each semantic frame to the overall sentence. We then show that the correlation of HMEANT, the human variant of MEANT, can be greatly improved by introducing a simple length-based weighting scheme that approximates the degree of contribution of each semantic frame to the overall sentence. The new results also show that, without flattening the structure of semantic frames, weighting the degree of each frame's contribution gives HMEANT higher correlations than the previously bestperforming flattened model, as well as HTER.

  • Chi-kiu LO and Dekai WU. "MEANT: inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles". 49th Annual Meeting of the Association for Computational Linguistic (ACL-2011). Portland, Oregon, US: June 2011.

    We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. But more accurate, non-automatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle. We first show that when using untrained monolingual readers to annotate semantic roles in MT output, the non-automatic version of the metric HMEANT achieves a 0.43 correlation coefficient with human adequacy judgments at the sentence level, far superior to BLEU at only 0.20, and equal to the far more expensive HTER. We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semi-automated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor cost for the evaluation procedure. The results show that our proposed metric is significantly better correlated with human judgment on adequacy than current widespread automatic evaluation metrics, while being much more cost effective than HTER.

  • Dekai WU, Pascale FUNG, Marine CARPUAT, Chi-kiu LO, Yongsheng YANG, and Zhaojun WU. "Lexical Semantics for Statistical Machine Translation". In Joseph Olive, Caitlin Christianson, and John McCary (editors), Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation. Springer. 2010.

    We present efforts toward moving statistical machine translation toward incorporating semantic modeling. The most glaring types of errors made by current systems appear to be prime targets for lexical semantics models, which have heretofore been largely absent from statistical machine translation models. Although sense disambiguation and semantic roles both appear highly relevant to translation accuracy, experience suggests that simply dropping in the existing models is unlikely to improve translation accuracy; rather, adaptations will be necessary. We discuss (1) a new Phrase Sense Disambiguation model that successfully improves statistical phrase-based translation for the first time by making three critical adaptations to traditional word sense disambiguation configurations, and (2) a series of empirical studies that illuminate more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy.

  • Chi-kiu LO and Dekai WU. "Semantic vs. Syntactic vs. N-gram Structure for Machine Translation Evaluation". Fourth Workshop on Syntax and Structure in Statistical Translation (SSST-4). Beijing, China: August 2010.

    We present results of an empirical study on evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to complete the semantic role annotation templates. Unlike the widely-used lexical and n-gram based or syntactic based MT evaluation metrics which are fluency-oriented, our results show that using semantic role labels to evaluate the utility of MT output achieve higher correlation with human judgments on adequacy. In this study, human readers were employed to identify the semantic role labels in the translation. For each role, the filler is considered an accurate translation if it expreses the same meaning as that annotated in the gold standard reference translation. Our SRL based f-score evaluation metric has a 0.41 correlation coefficient with the human judgment on adequacy, while in contrast BLEU has only a 0.25 correlation coefficient and the syntactic based MT evaluation metric STM has only 0.32 correlation coefficient with the human judgment on adequacy. Our results strongly indicate that using semantic role labels for MT evaluation can be significantly more efective and better correlated with human judgment on adequacy than BLEU and STM.

  • Chi-kiu LO and Dekai WU. "Evaluating Machine Translation Utility via Semantic Role Labels". Seventh International Conference on Language Resources and Evaluation (LREC-2010). Malta: May 2010.

    We present the methodology that underlies new metrics for semantic machine translation evaluation that we are developing. Unlike widely-used lexical and n-gram based MT evaluation metrics, the aim of semantic MT evaluation is to measure the utility of translations. We discuss the design of empirical studies to evaluate the utility of machine translation output by assessing the accuracy for key semantic roles. Such roles can be annotated using Propbank-style PRED and ARG labels. Recent work by Wu and Fung (2009) introduced methods based on automatic semantic role labeling into statistical machine translation, to enhance the quality of MT output. However, semantic SMT approaches have so far still only been evaluated using lexical and n-gram based SMT evaluation metrics such as BLEU, which are not aimed at evaluating the utility of MT output. Direct data analysis is still needed to understand how semantic models can be leveraged to evaluate the utility of MT output. In this paper, we discuss a new methodology for evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to match the Propbank annotation frames.

  • Chi-kiu LO and Dekai WU. "HKUST Statistical Machine Translation Experiments for CWMT 2009". The 5th China Workshop on Machine Translation(CWMT 2009). Nanjing: Oct 2009. 132-136.

  • Chi-kiu LO. "Using Semantic Role Labels to reorder Statistical Machine Translation output". Master of Philosophy Thesis. Department of Computer Science and Engineering, HKUST: August 2009.

  • Yihai SHEN, Chi-kiu LO, Marine CARPUAT and Dekai WU. "HKUST Statistical Machine Translation Experiments for IWSLT 2007". Fourth International Workshop on Spoken Language Translation(IWSLT 2007). Trento: Oct 2007. 84-88.

  • Grants and Awards

    Course Teaching


    Last updated: 1 Feb, 2012