CoCaBu --- Augmenting and Structuring User Queries to Support Efficient Free-Form Code Search

Speaker:        Dr. Dongsun Kim
                University of Luxembourg

Title:          "CoCaBu --- Augmenting and Structuring User Queries to
                 Support Efficient Free-Form Code Search"

Date:           Monday, 19 September 2016

Time:           4:00pm - 5:00pm

Venue:          Lecture Theatre F (near lifts 25/26), HKUST

Abstract:

Source code terms such as method names and variable types are often
different from conceptual words mentioned in a search query. This
vocabulary mismatch problem can make code search inefficient. In this
paper, we present COde voCABUlary (CoCaBu), an approach to resolving the
vocabulary mismatch problem when dealing with free-form code search
queries. Our approach leverages common developer questions and the
associated expert answers to augment user queries with the relevant, but
missing, structural code entities in order to improve the performance of
matching relevant code examples within large code repositories. To
instantiate this approach, we build GitSeach, a code search engine, on top
of GitHub.com and stackoverflow.com Q&A data. We evaluate GitSearch in
several dimensions to demonstrate that (1) its code search results are
correct with respect to user-accepted answers; (2) the results are
qualitatively better than those of existing Internet-scale code search
engines; (3) our engine is competitive against web search engines, such as
Google, in helping users complete solve programming tasks; and (4)
GitSearch provides code examples that are acceptable or interesting to the
community as answers for stackoverflow.com questions.


*******************
Biography:

Dongsun Kim is a Research Associate at the University of Luxembourg. He
was formerly a post-doctoral fellow at the Hong Kong University of Science
and Technology. His research interest includes automatic patch generation,
fault localization, static analysis, and search-based software engineering
(SBSE). In particular, automated debugging is his current focus. He is
also pursuing a topic on automatic code generation based on SBSE. His
recent work has been recognized by several awards such as a featured
article of the IEEE Transactions on Software Engineering (TSE) and ACM
SIGSOFT Distinguished Paper of the International Conference on Software
Engineering (ICSE). He is leading the FIXPATTERN project funded by FNR
(Luxembourg National Research Fund) CORE program.