LO, Chi-kiu 羅致翹 (Jackie)
PhD candidate of Department of Computer Science and Engineering, HKUST
Human Language Technology CenterDepartment of Computer Science
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Kowloon, Hong Kong
lab +852 2358-8831 room 2580 (lifts 27/28 and 29/30)
jackielo (at) cs (dot) ust (dot) hk http://www.cs.ust.hk/~jackielo/
Chi-kiu, LO received her Master of Philosophy (MPhil) in Computer Science and Engineering in 2009 and Bachelor of Engineering (BEng) in Computer Science (minor in Social Science) in 2004 from the Hong Kong University of Science and Technology (HKUST). She is currently pursuing her Doctor of Philosophy (PhD) degree in the Department of Computer Science and Engineering, HKUST under supervision of Prof. Dekai, Wu. Her research interest is statisitcal natural language processing with particular focuses on machine translation and semantic role labeling. She is experienced in instructing computer science courses of different levels, rangeing from computer literacy for non-CS undergraduates, fundamental programming for non-CS undergraduates, advanced programming for junior CS undergraduates, artificial intelligence and natural language processing for senior CS undergraduates, knowledge management for IT taught postgraduates and natural language processing for CS research postgraduates.
Research interests
Semantic Machine Translation Evaluation; machine translation; semantic analysis of text; statistical natural language processing; text mining; machine learning and data mining; customer relation management (CRM).
Publications
We introduce a radically simple yet effective methodology for annotating and aligning semantic frames inexpensively using untrained lay annotators that is ideally suited for practical semantic SMT and evaluation applications. For example, recent work by Lo and Wu (2011) introduced MEANT and HMEANT, which are state-of-the-art metrics that evaluates translation meaning preservation via Propbank style of semantic frames. For such applications, however, we argue that the Propbank annotation are too complex and detailed, since they are aimed at training linguists to annotate semantic frames with gold standard accuracy. Instead, we believe that annotating semantic frames for such purposes should be as intuitive as understanding the basic event structure of a sentence, which any untrained human does effortlessly. We propose a simplified set of annotation guidelines consisting of half a page plus three annotated examples. Together with a graphical user interface designed to facilitate the annotation and comparison process by guiding untrained humans step by step, only 5 to 15 minutes are needed to train lay annotators. This allows the lay annotators to focus on understanding the translation to provide consistent and efficient annotation and comparison. The methodology is 'cloud' based to be truly platform independent, installation-free and portable.
We argue for an alternative paradigm in evaluating machine translation quality that is strongly empirical but more accurately reflects the utility of translations, by returning to a representational foundation based on AI oriented lexical semantics, rather than the superficial flat n-gram and string representations recently dominating the field. Driven by such metrics as BLEU and WER, current SMT frequently produces unusable translations where the semantic event structure is mistranslated: who did what to whom, when, where, why, and how? We argue that it is time for a new generation of more "intelligent" automatic and semi-automatic metrics, based clearly on getting the structure right at the lexical semantics level. We show empirically that it is possible to use simple PropBank style semantic frame representations to surpass all currently widespread metrics' correlation to human adequacy judgments, including even HTER. We combine the best of both worlds: from an SMT perspective, we provide superior yet low-cost quantitative objective functions for translation quality; and yet from an AI perspective, we regain the representational transparency and clear reflection of semantic utility of structural frame-based knowledge representations.
We argue that failing to capture the degree of contribution of each semantic frame in a sentence explains puzzling results in recent work on the MEANT family of semantic MT evaluation metrics, which have disturbingly indicated that dissociating semantic roles and fillers from their predicates actually improves correlation with human adequacy judgments even though, intuitively, properly segregating event frames should more accurately reflect the preservation of meaning. Our analysis finds that both properly structured and flattened representations fail to adequately account for the contribution of each semantic frame to the overall sentence. We then show that the correlation of HMEANT, the human variant of MEANT, can be greatly improved by introducing a simple length-based weighting scheme that approximates the degree of contribution of each semantic frame to the overall sentence. The new results also show that, without flattening the structure of semantic frames, weighting the degree of each frame's contribution gives HMEANT higher correlations than the previously bestperforming flattened model, as well as HTER.
We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. But more accurate, non-automatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle. We first show that when using untrained monolingual readers to annotate semantic roles in MT output, the non-automatic version of the metric HMEANT achieves a 0.43 correlation coefficient with human adequacy judgments at the sentence level, far superior to BLEU at only 0.20, and equal to the far more expensive HTER. We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semi-automated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor cost for the evaluation procedure. The results show that our proposed metric is significantly better correlated with human judgment on adequacy than current widespread automatic evaluation metrics, while being much more cost effective than HTER.
We present efforts toward moving statistical machine translation toward incorporating semantic modeling. The most glaring types of errors made by current systems appear to be prime targets for lexical semantics models, which have heretofore been largely absent from statistical machine translation models. Although sense disambiguation and semantic roles both appear highly relevant to translation accuracy, experience suggests that simply dropping in the existing models is unlikely to improve translation accuracy; rather, adaptations will be necessary. We discuss (1) a new Phrase Sense Disambiguation model that successfully improves statistical phrase-based translation for the first time by making three critical adaptations to traditional word sense disambiguation configurations, and (2) a series of empirical studies that illuminate more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy.
We present results of an empirical study on evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to complete the semantic role annotation templates. Unlike the widely-used lexical and n-gram based or syntactic based MT evaluation metrics which are fluency-oriented, our results show that using semantic role labels to evaluate the utility of MT output achieve higher correlation with human judgments on adequacy. In this study, human readers were employed to identify the semantic role labels in the translation. For each role, the filler is considered an accurate translation if it expreses the same meaning as that annotated in the gold standard reference translation. Our SRL based f-score evaluation metric has a 0.41 correlation coefficient with the human judgment on adequacy, while in contrast BLEU has only a 0.25 correlation coefficient and the syntactic based MT evaluation metric STM has only 0.32 correlation coefficient with the human judgment on adequacy. Our results strongly indicate that using semantic role labels for MT evaluation can be significantly more efective and better correlated with human judgment on adequacy than BLEU and STM.
We present the methodology that underlies new metrics for semantic machine translation evaluation that we are developing. Unlike widely-used lexical and n-gram based MT evaluation metrics, the aim of semantic MT evaluation is to measure the utility of translations. We discuss the design of empirical studies to evaluate the utility of machine translation output by assessing the accuracy for key semantic roles. Such roles can be annotated using Propbank-style PRED and ARG labels. Recent work by Wu and Fung (2009) introduced methods based on automatic semantic role labeling into statistical machine translation, to enhance the quality of MT output. However, semantic SMT approaches have so far still only been evaluated using lexical and n-gram based SMT evaluation metrics such as BLEU, which are not aimed at evaluating the utility of MT output. Direct data analysis is still needed to understand how semantic models can be leveraged to evaluate the utility of MT output. In this paper, we discuss a new methodology for evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to match the Propbank annotation frames.
Grants and Awards
- Sept 2011.
Research Travel Grant (RTG11/12.EG012).
Hong Kong University of Science and Technology.
For oral and poster presentation of the paper entitled "SMT vs. AI redux: How semantic frames evaluate MT more accurately" in IJCAI-11, Barcelona, Spain. - Sept 2011. Postgraduate Studentship. Hong Kong University of Science and Technology.
- Jun 2011.
ACL Student Travel Award.
The Association of Computational Linguistics.
For oral presentation of the papers entitled "MEANT: inexpensive, high-accuracy, sei-automatic metric for evaluating translation utility based on semantic roles" in ACL-2011 and "Structured vs. Flat Semantic Role Representations for Machine Translation Evaluation" in SSST-5, Portland, Oregon, US. - Sept 2010. Postgraduate Studentship. Hong Kong University of Science and Technology.
- Sept 2009. Postgraduate Studentship. Hong Kong University of Science and Technology.
Course Teaching
- Spring 2012: COMP4221 Introduction to Natural Language Processing
- Fall 2011: COMP3031 Principles of Programming Languages
- Spring 2011: COMP300H Introduction to Natural Language Processing
- Fall 2010: COMP221 Fundamentals of Artificial Intelligence
- Spring 2010: COMP300H Introduction to Natural Language Processing
- Fall 2009: COMP221 Fundamentals of Artificial Intelligence
- Summer 2009: CSIT523 Knowledge Management
- Fall 2008: COMP526 Natural Language Processing
- Spring 2008: COMP151 Object-oriented Programming
- Spring 2008: COMP101 Exploring Multimedia and Internet Computing
- Fall 2007: COMP251 Principles of Programming Languages
- Fall 2007: COMP101 Exploring Multimedia and Internet Computing
- Summer 2007: COMP102 Computer and Programming Fundamentals I
- Spring 2007: COMP151 Object-oriented Programming
- Spring 2007: COMP101 Exploring Multimedia and Internet Computing
- Fall 2006: COMP102 Computer and Programming Fundamentals I
- Fall 2006: COMP101 Exploring Multimedia and Internet Computing
Last updated: 1 Feb, 2012
