Towards Efficient and Practical Network Optimization for Big Data Analytics

PhD Thesis Proposal Defence


Title: "Towards Efficient and Practical Network Optimization for Big Data
Analytics"

by

Mr. Hong ZHANG


Abstract:

Scale matters. In the era of big data, the unprecedented growth of data 
scale is fundamentally transforming the way we make sense of it. With the 
rapid rise of cloud computing, applications with massive input datasets 
are scaling out to thousands of machines to efficiently exploit I/O 
parallelism.These applications cover a wide variety of big data analytics 
to uncover hidden patterns, unknown correlations, and other useful 
information from the data.

As one of the major challenges introduced by these data-parallel 
applications, communication among the distributed tasks often results in 
massive data transfers over the network. To address this problem, we 
observe continuous efforts in industry to build high-capacity, low-latency 
datacenter networking infrastructure at scale; and concentrated efforts in 
academia to develop efficient network optimization mechanisms for big data 
analytics.

However, as a first-hand experience, we find efficient network 
optimization profoundly challenging --- especially when performed in a 
practical manner. First, application-aware network scheduling using 
coflows has been shown to improve application-level communication 
performance. However, existing coflow-based solutions rely on modifying 
the underlying computing frameworks to identify coflows (i.e., to match 
the applications with the flows they generate), making them inapplicable 
to many practical scenarios. Moreover, precise network load balancing is 
crucial to ensure the network schedule and deliver suitable application 
performance. Meanwhile, production datacenters operate under various 
uncertainties such as traffic dynamics, topology asymmetry, and failures. 
These uncertainties make network load balancing challenging in practice.

This dissertation describes my research efforts in performing efficient 
and practical network optimization for big data analytics. First, we 
propose CODA, a practical application-aware network scheduling framework. 
CODA makes the first attempt at automatically identifying and scheduling 
coflows without any framework-level modification. As a result, it serves 
as one necessary and natural step towards practical network optimization 
for big data applications. Second, we present Hermes, a resilient load 
balancing scheme tailored for the dynamic and complex datacenter 
environment. Hermes gracefully handles various kinds of uncertainties 
(e.g., traffic dynamics, topology asymmetry, and failures) in a 
readily-deployable fashion.


Date:			Tuesday, 2 October 2018

Time:                  	4:00pm - 6:00pm

Venue:                  Room 2303
                         (lifts 17/18)

Committee Members:	Dr. Kai Chen (Supervisor)
 			Prof. Lei Chen (Chairperson)
 			Dr. Wei Wang
 			Dr. Ke Yi


**** ALL are Welcome ****