Principles and Automation of Low-Level Optimizations on GPUs

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Principles and Automation of Low-Level Optimizations on GPUs"

By

Mr. Da YAN


Abstract

Performance optimizations on GPUs are not well-understood enough. This thesis 
discusses principles and automation of performance optimizations on NVIDIA 
GPUs, with a special focus on compute-bound kernels. This thesis focuses on the 
abstraction layers between portable virtual instruction sets (e.g., LLVM IR, 
NVIDIA PTX) and native hardware assembly.

We first introduce the native GPU instruction set, Shader ASSembly (SASS). 
Previously, the public cannot customize SASS generation as the only way to 
generate SASS is to use close-sourced proprietary compiler ptxas. ptxas hides 
many important optimizations including instruction scheduling. We develop an 
open-source assembler, TuringAs, for the public to manipulate SASS. And we 
identified new optimization opportunities at the SASS level. For instance, 
using some native SASS instructions helps to reduce register pressure and 
reordering SASS instructions leads to better instruction-level parallelism thus 
increasing throughput. We evaluate the effectiveness of our optimizations with 
the examples of Winograd convolution (a fast convolution algorithm) and Tensor 
Core matrix multiplication.

Next, we introduce our effort to automate SASS optimizations to promote 
productivity. Programming in SASS doesn't scale to a large number of kernels or 
new GPU architectures. We develop GASS, an LLVM-based compiler that translates 
high-level virtual representation (i.e., LLVM IR) to optimized SASS 
automatically. We highlight our newly proposed instruction scheduler for 
compute-bound deep learning kernels, our customization of the if-conversion 
pass, and our algorithms to resolve data dependency. The evaluation shows that 
our algorithms in GASS outperform LLVM's algorithms by a considerable margin 
and GASS is on-par with highly optimized proprietary compiler ptxas.


Date:			Monday, 4 July 2022

Time:			2:00pm - 4:00pm

Zoom Meeting: 
https://hkust.zoom.us/j/96421397372?pwd=a3VPMHJCZS9haGJyUDlIeGNuUWlHdz09

Chairperson:		Prof. Robert KO (LIFS)

Committee Members:	Prof. Wei WANG (Supervisor)
 			Prof. Lionel PARREAUX
 			Prof. Shuai WANG
 			Prof. Wei ZHANG (ECE)
 			Prof. Bei YU (CUHK)


**** ALL are Welcome ****