Co-Location Pattern Discovery

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Co-Location Pattern Discovery"

By

Miss Xiangye Xiao


Abstract

Co-location pattern discovery is to find classes of objects whose associated 
spatial locations are frequently in proximity. For example, map search queries, 
which contain keywords in text as well as target locations on the map, can be 
mined for co-located query patterns, i.e., sets of keyword queries that often 
search for target locations near one another. Such co-located query patterns 
can be used in location sensitive query suggestion, Point of Interest (POI) 
recommendation, and local advertising.

This thesis investigates ways to improve the efficiency of co-location mining 
for large data sets, e.g., million-entry map search query logs. In particular, 
we improve the efficiency of the generate-and-test method, which is commonly 
used in co-location mining. This method iteratively generates instances of 
candidate patterns, tests and prunes the false candidates.

Through experiments, we find that the major problem in generate-and-test is the 
excessive number of instances generated for false candidates. This problem 
causes both the instance generation and the pruning steps to take an 
unnecessarily long time. Therefore, we propose two orthogonal approaches, 
DenseMiner and BitmapMiner, to address the problem on instance generation and 
on pruning respectively.

DenseMiner partitions the geographic area of objects and processes the 
partitions that contain a high density of objects first. This approach works 
well because the spatial distributions of real-world objects are usually 
non-uniform.  As a result, processing the dense areas first often identifies 
false candidates early on and avoids generating the instances of these false 
candidates in the other areas.

BitmapMiner also partitions objects by their geographical locations. It further 
utilizes a bitmap structure to represent the neighboring relationship between 
classes of objects. The cell size for the bitmap structure is set based on the 
overall density of the objects. The candidate pruning in each cell can be done 
on a per-object basis, on a per-cell basis, or in an adaptive, multi-resolution 
way to match the spatial distribution of the objects in the cell. Consequently, 
we test and prune false candidates efficiently based on the bitmaps.

We have conducted experiments on both synthetic data sets and real-world data 
sets. The experimental results show that our proposed approaches improve the 
execution time by up to an order of magnitude over prior work.


Date:			Thursday, 20 August 2009

Time:			10:00am - 12:00noon

Venue:			Room 3501
 			Lifts 25-26

Chairman:		Prof. David Lam (MECH)

Committee Members:	Prof. Qiong Luo (Supervisor)
 			Prof. Wei-Ying Ma (Supervisor, Microsoft Research Asia)
 			Prof. Dik-Lun Lee
 			Prof. Wilfred Ng
 			Prof. Jiang Xu (ECE)
 			Prof. Jeffrey Yu (Sys. Engg. & Engg. Mgmt., CUHK)


**** ALL are Welcome ****