COMP 4221/5221 - Spring 2018

Spring 2018, COMP 4221 Introduction to Natural Language Processing [3-0-1:3]
Spring 2018, COMP 5221 Natural Language Processing [3-0-0:3]
Lecture 1, M 09:00-11:50, Rm 2406 (L17-18)
Prof. Dekai WU, Rm 3539, 2358-6989,

Tut 1A TA: Yuchen YAN and Serkan KUMYOL, W 9:30-10:20, Rm 4503 and

You are welcome to knock on the door of the instructor any time. The TA's office hours are posted at


Welcome to COMP4221 for UGs and COMP5221 for PGs! (The COMP4221 course was formerly called COMP300H and COMP326, and the COMP5221 course for PGs was formerly called COMP526.) Tutorials will begin in Week 1.

Always check the Discussion Forum for up-to-the-minute announcements.

Discussion forum is at Always read before asking/posting/emailing your question. You must register for your account at the first lecture, tutorial, or lab.
Course home page is at
Tutorial info is at


< hx:include src=html/outcomes.html>

Abbreviated Course Catalog Description

COMP 4221. Human language technology for text and spoken language. Machine learning, syntactic parsing, semantic interpretation, and context-based approaches to machine translation, text mining, and web search.

Course Description

Human language technology for processing text and spoken language. Fundamental machine learning, syntactic parsing, semantic interpretation, and context models, algorithms, and techniques. Applications include machine translation, web technologies, text mining, knowledge management, cognitive modeling, intelligent dialog systems, and computational linguistics.



To receive a passing grade, you are required to sign an honor statement acknowledging that you understand and will uphold all policies on plagiarism and collaboration.


All materials submitted for grading must be your own work. You are advised against being involved in any form of copying (either copying other people's work or allowing others to copy yours). If you are found to be involved in an incident of plagiarism, you will receive a failing grade for the course and the incident will be reported for appropriate disciplinary actions.

University policy requires that students who cheat more than once be expelled. Please review the cheating topic from your UST Student Orientation.

Warning: sophisticated plagiarism detection systems are in operation!


You are encouraged to collaborate in study groups. However, you must write up solutions on your own. You must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism.


Course grading will be adjusted to the difficulty of assignments and exams. Moreover, I guarantee you the following.

If you achieve 85% you will receive at least a A grade.
75% B
65% C
55% D

Your grade will be determined by a combination of factors:

Midterm exam ~20%
Final exam ~25%
Participation ~5%
Assignments ~50%


No reading material is allowed during the examinations. No make-ups will be given unless prior approval is granted by the instructor, or you are in unfavorable medical condition with physician's documentation on the day of the examination. In addition, being absent at the final examination results in automatic failure of the course according to university regulations, unless prior approval is obtained from the department head.

There will be one midterm worth approximately 20%, and one final exam worth approximately 25%.


Science and engineering (including software engineering!) is about communication between people. Good participation in class and/or the online forum will count for approximately 5%.


All assignments must be submitted by 23:00 on the due date. Scheme programming assignments must run under Chicken Scheme on Linux. Assignments will be collected electronically using the automated CASS assignment collection system. Late assignments cannot be accepted. Sorry, in the interest of fairness, exceptions cannot be made.

Programming assignments will account for a total of approximately 50%.


All information for tutorials is at


Date Wk Event Topic

2018.02.05 1 Lecture Learning to translate: engineering, social, and scientific motivations
Admiinistrivia (honor statement, HKUST classroom conduct)
2015.02.05 1 Lecture Linguistic relativism and the Sapir-Whorf hypothesis; inductive bias, language bias, search bias; the great cycle of intelligence
2018.02.07 1 Lecture Does God play dice? Assumptions: scientific method, hypotheses, models, learning, probability; languages of the world [at tutorial]
2018.02.12 2 Lecture Is machine translation intelligent? Interactive simulation
2018.02.14 2 Lecture "It's all Chinese to me": linguistic complexity
2018.02.19 3 Holiday Fourth day of Lunar New Year
2018.02.21 3 Lecture "It's all Chinese to me": challenges in modeling translation
2018.02.26 4 Lecture Probability review
2018.02.28 4 Lecture Probability review
2018.03.05 5 Lecture Anagrams
2018.03.05 5 Lecture Markov models, n-gram models
2018.03.05 5 Lecture Anagrams with replacement; uninformed search
2018.03.07 5 Lecture Dijkstra's shortest path algorithm; Chinese anagrams[at tutorial]
2018.04.09 9 Exam Midterm
2017.05.TBA 15 Exam COMP4221 Final [room TBA, time TBA]

Background review
Last updated: 2018.03.12