Effective Discovery of Order-Preserving Submatrices

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Effective Discovery of Order-Preserving Submatrices"

By

Miss Qiong Fang


Abstract

As matrix is a common data presentation in many applications, the 
submatrix mining problem has been extensively studied to reveal 
interesting correlations among a set of rows and a set of columns. Various 
submatrix models have been proposed to describe different local 
correlations, among which, the order-preserving submatrix (OPSM) model is 
proposed to capture the fact that the entry values of a set of rows follow 
the same trend under a set of columns. In this thesis, we systematically 
study the problem of mining OPSM patterns to address various real 
application needs.

In gene expression analysis, as the OPSM model is regarded to be 
restrictive and not practical when sample contamination inevitably exists, 
we consider two relaxation strategies to relax the strict OPSM model. 
Three relaxed OPSM models and the corresponding pattern mining methods 
have been proposed. Experimental studies on real biological data show that 
our relaxed OPSM models lead to the discovery of interesting correlations 
with high biological significance. The experiments also justify the 
efficiency of our proposed pattern mining methods.

Due to the prevalence of uncertain data, we define new probabilistic 
matrix representations, and formalize a novel probabilistic OPSM (POPSM) 
model. We propose an efficient POPSM mining framework called ProbApri. We 
demonstrate the superiority of our approach by two applications of POPSM. 
Experiments on biological datasets show that the POPSM model better 
captures the characteristics of the expression levels of biologically 
correlated genes. Experiments on a RFID trace dataset demonstrate the 
effectiveness of our POPSM model in capturing the common visiting behavior 
among users.

Data in applications like recommender systems can also be organized as 
matrices, which however usually have extremely large size and high degree 
of sparsity. We propose a sparse OPSM (SOPSM) model for sparse matrices, 
and adopt two criteria to control the number and distribution of patterns 
mined from large matrices. Accordingly, we formulate a constrained SOPSM 
mining problem, and develop an effective approach called Quick Probing 
(QP) and its parallelization scheme. Experiments on two datasets from real 
recommender systems show that the QP framework achieves high scalability 
and is efficient for handling matrices having millions of rows and 
columns. Besides, QP predicts the ordered preference of users over a set 
of items with high accuracy.


Date:			Thursday, 15 March 2012

Time:			10:00am – 12:00noon

Venue:			Room 3401
 			Lifts 17/18

Chairman:		Prof. Elizabeth George (MGMT)

Committee Members:	Prof. Wilfred Ng (Supervisor)
 			Prof. Lei Chen
 			Prof. Qiang Yang
 			Prof. Weichuan Yu (ECE)
                       	Prof. Chi-Ming Kao (Comp. Sci. & Inf. Sys., HKU)


**** ALL are Welcome ****