CROSS-MATCHING BIG ASTRONOMIC CATALOGS ON HETEROGENEOUS CLUSTERS

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "CROSS-MATCHING BIG ASTRONOMIC CATALOGS ON HETEROGENEOUS CLUSTERS"

By

Miss Xiaoying JIA


Abstract

In astronomy, cross-match is a central operation to integrate multi-wavelength 
information by identifying celestial objects across multiple catalogs. With the 
rapid increase in data volume from space and ground-based surveys, it becomes 
mandatory to process large astronomic catalogs efficiently. In this thesis, we 
study how to accelerate the cross-match of billion-record catalogs on a cluster 
of heterogeneous computers with both CPUs and GPUs.

Specifically, we present two cross-match algorithms, namely IB-CM (Index-Based 
Cross-Match) and MASJ-CM (Multi-Assignment Single-Join Cross-Match), and study 
the performance impact of indexing methods as well as design choices and 
optimizations of both algorithms for a heterogeneous computer cluster. We have 
implemented these algorithms fully utilizing the computation and communication 
resources of the cluster, and compared with those on Spark and SpatialHadoop, 
two popular distributed computing platforms. Our evaluations on real-world 
astronomic catalogs show that our native implementations were orders of 
magnitude faster than those on Spark or SpatialHadoop and that self-matching 
billion-record catalogs on a six-node cluster finished under five minutes.


Date:			Wednesday, 26 July 2017

Time:			10:00am - 12:00noon

Venue:			Room 2612B
 			Lifts 31/32

Chairman:		Prof. Zhenyang Lin (CHEM)

Committee Members:	Prof. Qiong Luo (Supervisor)
 			Prof. Lei Chen
 			Prof. Raymond Wong
 			Prof. Wei Zhang (ECE)
 			Prof. Xiaowen Chu (Comp. Sci., Baptist U)


**** ALL are Welcome ****