REPRODUCING SHARED-MEMORY CONCURRENCY BUGS FOR BUG DIAGNOSTICS

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "REPRODUCING SHARED-MEMORY CONCURRENCY BUGS FOR BUG DIAGNOSTICS"

By

Mr. Jinguo ZHOU


Abstract

Concurrency bug is a major stumbling block to writing multi-threaded programs. 
The capability of reproducing a concurrency bug is key to comprehending and 
finally fixing the bug. Unlike sequential programs, concurrent programs may 
behave differently even given the same input, making concurrency bugs much 
harder to be reproduced than bugs caused within a single thread. This thesis 
targets on the reproduction of shared-memory concurrency bugs, together with 
new techniques for the bug diagnostics given the capability of reproducing the 
bug.

First, we contribute Stride, a deterministic replay technique for concurrent 
programs, that improves the efficiency of the recording phase, which is key to 
the practicability of replay techniques. Stride records the bounded read-write 
linkage instead of the exact linkage, which eliminates synchronization 
requirements on read operations during the recording phase and is still able to 
synthesize a legal execution in polynomial time. Comparing to the previous 
state-of-the-art approaches of deterministic replay, Stride is on average 2.5 
times faster.

Second, we handle concurrency bug reproduction under two special scenarios with 
special requirements. One is the bug reproduction for relaxed memory model(RMM) 
specific behaviors, the challenge of which is the Heisenberg effects caused by 
the code instrumentation incurred by bug reproduction techniques (Some RMM 
specific behaviors cannot occur in the instrumented program), which leads to a 
poor coverage. We contribute SC-Transformation, the first technique that 
enables safe instrumentation under RMMs with the provable guarantee of full 
coverage, i.e., instrumentations on the program do not prevent RMM-specific 
behaviors. Therefore, we can apply all kinds of instrumentations to 
deterministically reproduce RMM-specific behaviors. Experiments show that, 
supported by SC-Transformation, we successfully recorded all of the seventy-one 
RMM-specific bug traces we tested, none of which could be captured atop naive 
program instrumentation. The second special scenario is the bug reproduction on 
the production runs on I/O intensive programs, the challenge of which is the 
long running time and the large input size. We contribute POINT, which departs 
from the conventional philosophy of reproducing a similar execution by going a 
new direction of reproducing only the interleaving windows occurring among a 
few lines of code. Being more focused on the source of concurrency bugs, POINT 
requires no duplication of the user inputs and supports programs running for 
months.

Finally, we have developed a method that deterministically pinpoints the root 
cause of concurrency bugs given the capability of reproducing the bug, which is 
achieved by deterministically isolating the effects of the context-switches 
within the buggy execution.


Date:			Thursday, 17 March 2016

Time:			2:00pm – 4:00pm

Venue:			Room 4504
  			Lifts 25/26

Chairman:		Prof. Zongjin Li (CIVL)

Committee Members:	Prof. Charles Zhang (Supervisor)
  			Prof. Shing-Chi Cheung
  			Prof. Fangzhen Lin
  			Prof. Maosheng Xiong (MATH)
  			Prof. Grigore Rosu (U of Illinois At
 					    Urbana-Champaign)


**** ALL are Welcome ****