Towards High Speed Data Center Network: Challenges and Solutions

PhD Thesis Proposal Defence

Title: "Towards High Speed Data Center Network: Challenges and Solutions"


Mr. Shuihai HU


In recent years, the link speed of data center networks (DCNs) significantly 
increases, from 1Gbps to 10Gbps, to 40/100Gbps with 200Gbps on the horizons. In 
the era of high speed DCNs, it is increasingly clear that traditional kernel 
based network transports can no longer meet the requirements of modern data 
center applications, mainly for two reasons. First, traditional network 
transports adopt reactive algorithms for congestion control, which is too slow 
and inefficient at high speed. Second, kernel based transports have very high 
CPU overhead at high speed and thus can hardly deliver low latency and high 
throughput to applications/services at low cost. Realizing the drawbacks of 
traditional network transports, great effort has been made in the recent years. 
However, existing solutions either fail to achieve desirable performance or are 
difficult to deploy in production environments.

Regarding congestion control for high speed DCNs, proactive congestion control 
solutions recently have drawn great attention in the research community. By 
explicitly scheduling packet transfers based on the availability of network 
bandwidth, proactive solutions offer a lossless, near-zero queueing network for 
data transmission. Despite the advantages, a major drawback of proactive 
solutions is that, an extra RTT is needed to allocate rates for new arrival 
flows. To solve this, existing solutions let new flows blindly transmit 
unscheduled packets in the first RTT. The unscheduled packets, however, can 
cause severe congestion under heavy workloads, resulting in large queue 
buildups and even loss of scheduled packets, affecting the properties of 
proactive solutions.

Regarding providing desirable network performance at low CPU overhead, public 
cloud providers like Microsoft and Google are deploying remote direct memory 
access (RDMA) over Ethernet (RoCE) in their data centers to enable low latency, 
high throughput data transfers with minimal CPU overhead. Roce deployments, 
however, are vulnerable to deadlocks induced by Priority Flow Control (PFC). 
Once deadlock is formed, through- put of the whole network or part of the 
network will go to zero due to the backpressure effect of PFC pause. This 
dissertation describes my research efforts to address the above two challenges. 
First, we present Aeolus, a simple yet effective solution that augments all 
existing proactive solutions. With Aeolus, two seemingly contradictory goals 
are achieved simultaneously: eliminating the one RTT additional delay while 
still preserving all the good properties of proactive solutions. Second, we 
propose a practical deadlock prevention scheme for RDMA DCNs, called Tagger. By 
carrying tags in the packets and installing pre-generated match-action rules in 
the switches for tag manipulation and buffer management, Tagger guarantees 
deadlock-freedom using only modest buffers without any changes to the rout- ing 
protocol or switch hardware.

Date:			Monday, 3 December 2018

Time:                  	11:00am - 1:00pm

Venue:                  Room 3494
                         (lifts 25/26)

Committee Members:	Dr. Kai Chen (Supervisor)
 			Dr. Yangqiu Song (Chairperson)
 			Prof. Lei Chen
 			Dr. Wei Wang

**** ALL are Welcome ****