GALLOP: GPU Acceleration for Genomics Applications

Graphics processors, or GPUs, have made high-performance computing inexpensive and widely accessible by packing hundreds of identical computing cores in a single chip. With the massive parallel processing power, GPUs have made their way into genomics applications through academic explorations as well as proprietary business solutions. Nevertheless, there lacks a large-scale, open, and systematic study of accelerating state-of-the-art genomic computing algorithms with the GPU. Therefore, we propose Gallop, an open-source software package that features new, GPU-accelerated algorithms for genomics applications.

Specifically, we are interested in four major computational tasks on genome data: (1) genome assembly, where short reads from an unknown DNA sequence are put together into a complete sequence; (2) sequence alignment, in which short reads are aligned to a reference sequence; (3) SNP (Single-Nucleotide Polymorphism) detection, through which the variation on a single nucleotide is identified between each aligned read and the reference sequence; and (4) genome-wide association study (GWAS), which examines the genomes of different individuals of a species. For each task, we first study the leading computational models based on effectiveness and popularity, and optimize the CPU-based algorithm. Then, we design a new GPU-based parallel algorithm that provides the same interface and functionality as the original CPU-based algorithm. Additionally, we optimize the memory access and disk IO, and schedule the CPU, the GPU, and the IO holistically for efficient co-processing.

Publications

Software

Nov 11, 2011: GSNP V1.0 code package (300KB).

Software License

The license is a free non-exclusive, non-transferable license to reproduce, use, modify and display the source code version of the Software, with or without modifications solely for non-commercial research, educational or evaluation purposes. The license does not entitle Licensee to technical support, telephone assistance, enhancements or updates to the Software. All rights, title to and ownership interest in Software, including all intellectual property rights therein shall remain in HKUST.

Acknowledgement

We thank our collaborator BGI Shenzhen for providing us application requirements, access to their software and data sets, and sharing their hardware resources as well as genomics domain knowledge. Funding for this project is provided by grants 616012 and 617509 from the Hong Kong Research Grants Council and MRA11EG01 from Microsoft SQL Server China R&D.