Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components

MPhil Thesis Defence


Title: "Joint Embeddings of Chinese Words, Characters, and Fine-grained 
Subcharacter Components"

By

Mr. Jinxing YU


Abstract

Word embedding has attracted much attention recently given its simplicity 
of word representation and generalization ability for a lot of downstream 
tasks. Different from alphabetic writing systems such as English, Chinese 
characters are often composed of subcharacter components which are also 
semantically informative.

In this paper, we propose an approach to jointly embed Chinese words as 
well as their characters and fine-grained subcharacter components. We use 
three likelihoods to evaluate whether the context words, characters, and 
components can predict the current target word, and collected 13,253 
subcharacter components to demonstrate the existing approaches of 
decomposing Chinese characters are not enough. Evaluation on intrinsic 
word similarity and word analogy tasks as well as extrinsic downstream 
classification tasks demonstrates the superior performance of our model.


Date:			Monday, 20 November 2017

Time:			2:00pm - 4:00pm

Venue:			Room 5510
 			Lifts 25/26

Committee Members:	Prof. Nevin Zhang (Supervisor)
 			Dr. Raymond Wong (Chairperson)
 			Dr. Yangqiu Song


**** ALL are Welcome ****