ReLink: Recovering Links between Bugs and Changes
 


 

Introduction

Software defect information, including links between bugs and committed changes, plays an important role in software maintenance such as measuring quality and predicting defects. Usually, the links are automatically mined from change logs and bug reports using heuristics such as searching for specific keywords and bug IDs in change logs. However, the accuracy of these heuristics depends on the quality of change logs. Bird et al. found that there are many missing links due to the absence of bug references in change logs. They also found that the missing links lead to biased defect information, and it affects defect prediction performance.

We manually inspected the explicit links, which have explicit bug IDs in change logs and observed that the links exhibit certain features. Based on our observation, we developed an automatic link recovery algorithm, ReLink, which automatically learns criteria of features from explicit links to recover missing links. We applied ReLink to three open source projects. ReLink reliably identified links with 89% precision and 78% recall on average, while the traditional heuristics alone achieve 91% precision and 64% recall. We also evaluated the impact of recovered links on software maintainability measurement and defect prediction, and found the results of ReLink yields significantly better accuracy than those of traditional heuristics.

  Keywords: Mining software repository, missing links, data quality, bugs, changes.

 

Data and Tool

The ReLink tool and our experimental data are available for download: Tool.

Usage instructions are given in the README.txt file in the package.

 

Publications

 

 Project Members

 

If you have any comments/questions regarding the research work or the tool,  please feel free to contact any of the project members.

 

 

 

-          End -

 

Last updated: April 2011.