TRANSFER LEARNING WITH OPEN WEB DATA

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "TRANSFER LEARNING WITH OPEN WEB DATA"

By

Mr. Wei XIANG


Abstract

In recent years, transfer learning has been applied to a variety of real-world 
application domains, ranging from text classification, image classification, 
link prediction, activity recognition, to social network analysis. Transfer 
learning is particularly useful when we only have limited labeled data in a 
target domain, which requires that we consult one or more auxiliary or source 
domains to gain insight on how to solve the target problem. Thus, the key point 
for successful knowledge transfer is that one or more “right” source data 
should be given by the problem designer at the learning time. However, it is 
very difficult to identify a proper set of source data. An intuitive idea is 
whether we can directly seek the needed source data from the open Web. In this 
thesis, we try to study how to extend the existing transfer learning techniques 
to cope with the need for transfer learning from the massive and noisy Web 
data. We focus on tackling the following four research issues: (1) Transfer 
over information gap; (2) Transfer from heterogeneous data; (3) Transfer with 
partially labeled correspondence; (4) Selective transfer from massive and noisy 
sources. For each of the above mentioned issues, we first conduct extensive 
study on the difficulty of the problems, and then propose a series of effective 
solutions accordingly. Moreover, to cope with the need for manipulating the 
massive Web data as the source, we also investigate how to make our transfer 
learning models to be scalable with the assist of distributed computing 
techniques. We apply these methods to two diverse applications: text 
classification and link prediction, and achieve promising results. Experimental 
results show that our methods can successfully benefit from the truly useful 
information contained in the Web, while reducing the risks caused by massive 
and noisy property of the open Web to the minimum.


Date:			Tuesday, 29 May 2012

Time:			2:00pm – 4:00pm

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Kun Xu (MATH)

Committee Members:	Prof. Qiang Yang (Supervisor)
 			Prof. Shing-Chi Cheung
 			Prof. Raymond Wong
 			Prof. Rong Zheng (ISOM)
                      	Prof. Haifeng Wang (Habin Inst. of Tech.)


**** ALL are Welcome ****