Can the use of types and query expansion help improve large-scale code search?

With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainl...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM) pp. 41 - 50
Main Authors Lazzarini Lemos, Otavio Augusto, de Paula, Adriano Carvalho, Sajnani, Hitesh, Lopes, Cristina V.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2015
Subjects
Online AccessGet full text
DOI10.1109/SCAM.2015.7335400

Cover

More Information
Summary:With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.
DOI:10.1109/SCAM.2015.7335400