A Survey of Synchronization and Scheduling in General-Purpose Distributed Machine Learning Platforms

PhD Qualifying Examination


Title: "A Survey of Synchronization and Scheduling in General-Purpose 
Distributed Machine Learning Platforms"

by

Mr. Chengliang ZHANG


Abstract:

Large datasets and models can achieve state-of-the-art machine learning 
results, but training such models is both time-consuming and 
computation-intensive. A typical large dataset can take up to terabytes of 
storage, while a complex model have billions of parameters to be trained, 
no single machine can accommodate such demand. Intuitively, one can train 
these models in distributed clusters consisting of commodity machines in 
parallel. As a result, recent years have witnessed relentless research 
efforts on distributed machine learning.

The survey investigates the state-of-the-art architecture called Parameter 
Server, which is tailored for large scale machine learning problems. 
Besides the design philosophy of parameter server, we focus on the 
synchronization schemes and the trade-off between computation efficiency 
and consistency. We then survey the ongoing efforts on improving parameter 
server performance by addressing problems like heterogeneity, machine 
failure, and communication scheduling.


Date:			Wednesday, 25 April 2018

Time:                  	3:00pm - 5:00pm

Venue:                  Room 2611
                         Lifts 31/32

Committee Members:	Dr. Wei Wang (Supervisor)
 			Prof. James Kwok (Chairperson)
 			Dr. Kai Chen
 			Prof. Bo Li


**** ALL are Welcome ****