Predicting Bugs by Analyzing Software History

Speaker:	Dr. Sunghun KIM
		MIT

Title: 		"Predicting Bugs by Analyzing Software History"

Date:		Monday, 7 April 2008

Time:		4:00pm - 5:00pm

Venue:		Lecture Theatre F
		(Leung Yat Sing Lecture Theatre, near lifts 25/26)
		HKUST

Abstract:

Almost all software contains undiscovered bugs, ones that have not yet
been exposed by testing or by users. What is the location of these bugs?
This talk presents two approaches for predicting the location of bugs by
analyzing software history. First, the bug cache contains 10% of the files
in a software project. Through an analysis of the software's development
history and the location of bugs, files are added and removed from the
cache based on four bug localities: temporal, spatial, changed-entity, and
new-entity locality. After processing, files in the bug cache contain
73-95% of undiscovered bugs. Second, to further improve the localization
of predicted bugs, automatic change classification uses information from
the configuration management commit transactions. Using machine learning
techniques (Bayes Net, Support Vector Machines), we classify commits as
being likely to have a fault, or unlikely to have a fault. The best
precision and recall figures for each project are typically in the
mid-70's. Hence, it is possible for a configuration management system to
inform a developer, post-commit, that they have just created a bug (with
approximately 94% likelihood).


****************************
Biography:

Sunghun KIM is a postdoctoral associate at MIT and a member of the Program
Analysis Group. He completed his Ph.D. in the Computer Science Department
at the University of California, Santa Cruz in 2006. He was a Chief
Technical Officer (CTO), and led a 25-person team at the Nara Vision Co.
Ltd, a leading Internet software company in Korea for six years. His core
research area is Software Engineering, focusing on software evolution,
program analysis, and empirical studies.