Efficient Algorithms for SimRank Computation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Efficient Algorithms for SimRank Computation"

By

Mr. Yue WANG


Abstract

Measuring similarities among different nodes is a fundamental problem in graph 
analysis. Among different similarity measurements, SimRank is one of the most 
promising and popular. Due to the huge amount of data generated by the 
activities of people every data, today’s graphs are often big and time 
evolving, examples include on-line social networks and on-line shopping 
platforms. This challenges current algorithms for SimRank computation w.r.t. 
efficiency, scalability and quality of results. In the thesis, we first study 
the problem of all-pairs SimRank computation. Observing current iterative 
methods for all-pairs SimRank are not efficient in time and space, due to 
unnecessary cost and storage by the nature of iterative updating, we propose a 
local push based algorithm, which has the property that not all SimRank scores 
are involved in the computation. The pushon- demand schema can reduce a lot of 
unnecessary cost, and has an accuracy guarantee. We further extend our 
algorithm to track accurate SimRank scores in dynamic graphs, which can address 
the accuracy issue of current incremental solutions. We then study the pairwise 
SimRank estimation problem, observing that current single-pair SimRank 
solutions are either static or inefficient in handling dynamic cases with 
good-quality results, we propose three algorithms to query pairwise SimRank 
over static and dynamic graphs efficiently, by using different sample reduction 
strategies. The accuracy of our algorithms is guaranteed by the different 
invariants we propose for pairwise SimRank. Finally, we study the problem 
finding similar pairs given a set of node pairs with SimRank, which has 
attractive applications in personalized search and recommendation tasks. We 
present an efficient framework for retrieving the top-k similarities from an 
arbitrary set of pairs. In addition, we introduce two types of indexes to boost 
the efficiency, one is hub-based, the other is tree-based.


Date:			Monday, 20 May 2019

Time:			2:30pm - 4:30pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Francesco Ciucci (MAE)

Committee Members:	Prof. Lei Chen (Supervisor)
 			Prof. Qiong Luo
 			Prof. Ke Yi
 			Prof. Can Yang (MATH)
 			Prof. Wenfei Fan (Univ of Edinburgh)

**** ALL are Welcome ****