Parallelizing De Novo Assembly with Heterogeneous Processors

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Parallelizing De Novo Assembly with Heterogeneous Processors"

By

Miss Shuang QIU


Abstract

De Novo assemblers construct genome sequences from small fragments, 
without using any reference genome. Specifically, they represent the 
fragments in a De Bruijn graph and traverse the graph to generate the 
sequence. As constructing and traversing a big De Bruijn graph is both 
time and memory space consuming, we develop UNIPAR, a parallel software 
package that runs this process on a cluster of GPU-equipped computers. In 
particular, it utilizes all processor cores in each CPU and GPU, all CPUs 
and GPUs in a computer node, and all computer nodes of the cluster. 
Furthermore, we analyze the characteristics of genome data to design a 
concurrent hashing algorithm for the graph construction, and to reduce the 
communication overhead in the graph traversal. We further improve the 
overall performance by partitioning and storing the data in a compact 
format, pipelining data transfer and computation, and overlapping 
computation and communication. Our experiments show that on real-world 
datasets, UNIPAR is an order of magnitude faster than the state-of-the-art 
shared memory based assemblers, and more than five times faster than the 
current distributed assemblers.


Date:			Thursday, 16 May 2019

Time:			10:00am - 12:00noon

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof. Ki Ling Cheung (ISOM)

Committee Members:	Prof. Qiong Luo (Supervisor)
 			Prof. Wilfred Ng
 			Prof. Ke Yi
 			Prof. Weichuan Yu (ECE)
 			Prof. Xiaowen Chu (Baptist U)


**** ALL are Welcome ****