A Survey for Optimizing Distributed Deep Learning in GPU Cluster

PhD Qualifying Examination


Title: "A Survey for Optimizing Distributed Deep Learning in GPU Cluster"

by

Mr. Xinchen WAN


Abstract:

Deep learning has been widely used in multiple application domains. As the 
training process may consume hours or days to complete, distributed 
systems are adopted for the purpose of timely training. Meanwhile, GPUs 
remain the dominant custom-accelerators for deep learning process, which 
motivates large companies to establish large-scale GPU clusters and deploy 
DL applications upon them. However, the way to collaborate between 
communication and computation for training in GPU clusters remains to be 
investigated.

In seeking high training efficiency, several optimization techniques are 
proposed. In this survey, we first give a background knowledge of 
distributed deep learning and GPU cluster. Then we present and discuss 
several techniques by categorizing them in two aspects: communication and 
computation. Lastly, we conclude by showing the limitations of current 
studies and providing new directions for future work.


Date:			Friday, 10 July 2020

Time:                  	4:00pm - 6:00pm

Zoom meeting:           https://hkust.zoom.us/j/99380725107

Committee Members:	Dr. Kai Chen (Supervisor)
 			Dr. Brahim Bensaou (Chairperson)
 			Dr. Qifeng Chen
 			Dr. Yangqiu Song


**** ALL are Welcome ****