Automatic Localization of Code Omission Faults

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Automatic Localization of Code Omission Faults"

By

Mr. Xinming Wang


Abstract

Debugging is a tedious and expensive activity in software maintenance. To 
ease debugging, researchers have proposed various dynamic analysis 
techniques that aid developers in locating the program defects based on 
passed and failed test runs. In this thesis, we study a prevalent type of 
defects called code omission faults, which involve missing code and cannot 
be ascribed to any program entity that exists in the program. Code 
omission faults pose two major challenges to existing dynamic analysis 
techniques. First, missing code offers no execution behavior to observe 
directly. Second, even when the place of omission could have been 
accurately pinpointed, it is still difficult for developers to conjecture 
the missing code.

To address these challenges of code omission, we conducted an empirical 
study to characterize real-world omission faults. Among non-trivial bug 
fixes extracted from large open source projects GNU/GCC and MYSQL, we 
found that over half of them correspond to omission faults, which can be 
further categorized into 11 sub-types. Among them, we made a novel 
discovery that some sub-types, such as missing-assignment or 
missing-return, often induce certain dynamic control flows and data flows 
patterns when they trigger program failures. We express these patterns in 
an event flow language and show that by matching them against program 
execution, it is possible to accurately locate these sub-types of omission 
faults and infer part of the missing code.

The remaining sub-types of omission faults, such as missing-if-condition 
and missing-branch, are more complex and induce patterns that vary with 
the missing code, which is mostly unknown at the time of debugging. For 
these complex omission faults, we empirically observed that the missing 
code likely appear elsewhere in a similar or even the same form. Inspired 
by this observation, we propose an approach called learning-to-debug. The 
basic idea is to systematically remove different parts of the program and 
learn the control-/data- flow patterns that are associated with these 
pieces of artificially removed code. The inferred patterns are then 
applied on the original program to locate the omission faults. We have 
designed a novel pattern representation and a learning algorithm in order 
to make learning-to-debug scalable.

To evaluate our approaches, we used both real world and seeded omission 
faults in four benchmark programs: space, grep, bc, and gcc. Promising 
results are reported in the experiments.


Date:			Tuesday, 24 August 2010

Time:			9:00am – 11:00am

Venue:			Room 3501
 			Lifts 25/26

Chairman:		Prof. Kai Tang (MECH)

Committee Members:	Prof. Shing-Chi Cheung (Supervisor)
 			Prof. Lin Gu
 			Prof. Charles Zhang
                         Prof. Ajay Joneja (IELM)
                         Prof. Michael Lyu (Comp. Sci. & Engg., CUHK)


**** ALL are Welcome ****