Can the use of types and query expansion help improve large-scale code search?

With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainl...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM) pp. 41 - 50
Main Authors Lazzarini Lemos, Otavio Augusto, de Paula, Adriano Carvalho, Sajnani, Hitesh, Lopes, Cristina V.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2015
Subjects
Online AccessGet full text
DOI10.1109/SCAM.2015.7335400

Cover

Abstract With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.
AbstractList With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.
Author Sajnani, Hitesh
de Paula, Adriano Carvalho
Lopes, Cristina V.
Lazzarini Lemos, Otavio Augusto
Author_xml – sequence: 1
  givenname: Otavio Augusto
  surname: Lazzarini Lemos
  fullname: Lazzarini Lemos, Otavio Augusto
  email: otavio.lemos@unifesp.br
  organization: Science and Technology Department - Federal University of São Paulo at S. J. dos Campos - Brazil
– sequence: 2
  givenname: Adriano Carvalho
  surname: de Paula
  fullname: de Paula, Adriano Carvalho
  email: adriano.carvalho@unifesp.br
  organization: Science and Technology Department - Federal University of São Paulo at S. J. dos Campos - Brazil
– sequence: 3
  givenname: Hitesh
  surname: Sajnani
  fullname: Sajnani, Hitesh
  organization: Donald Bren School of Information and Computer Sciences - University of California, Irvine - USA
– sequence: 4
  givenname: Cristina V.
  surname: Lopes
  fullname: Lopes, Cristina V.
  email: lopes@ics.uci.edu
  organization: Donald Bren School of Information and Computer Sciences - University of California, Irvine - USA
BookMark eNotz71OwzAUQGEjgQQtfQDE4hdI8E9sxxOqIihIBQZgrq6da2IpTYKdIvr2DHQ62yedBTkfxgEJueGs5JzZu_dm_VIKxlVppFQVY2dkwSttpFHCikuyyjk6JhmzVlTmirw2MNC5Q3rISMdA5-OEmcLQ0u8DpiPF3wmGHMeBdthPNO6nNP4g7SF9YZE99Ej92CLNCMl399fkIkCfcXXqknw-Pnw0T8X2bfPcrLdFFKyeCymcs1qDA9u2kgUVrDXWg6pqFiTWCpwWUqPFYHmtvRS89g6UAtSVaZlcktt_NyLibkpxD-m4Oy3LP2B9TmM
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SCAM.2015.7335400
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1467375292
9781467375290
EndPage 50
ExternalDocumentID 7335400
Genre orig-research
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i208t-32bb966aba9dd30f5f9979ca5480f3e85ab6236e9ef9186c3218cba55ae647d03
IEDL.DBID RIE
IngestDate Wed Dec 20 05:19:44 EST 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i208t-32bb966aba9dd30f5f9979ca5480f3e85ab6236e9ef9186c3218cba55ae647d03
PageCount 10
ParticipantIDs ieee_primary_7335400
PublicationCentury 2000
PublicationDate 20150901
PublicationDateYYYYMMDD 2015-09-01
PublicationDate_xml – month: 09
  year: 2015
  text: 20150901
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)
PublicationTitleAbbrev SCAM
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib030099247
Score 1.6746546
Snippet With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the...
SourceID ieee
SourceType Publisher
StartPage 41
SubjectTerms code search
Context
Java
Natural languages
query expansion
Search engines
Semantics
software reuse
Thesauri
Title Can the use of types and query expansion help improve large-scale code search?
URI https://ieeexplore.ieee.org/document/7335400
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1qT55UWvGbPXg0aZLNZrMnkWIpQoughd7KbjKLxZKWtjnor3cmSSuKB28hkA9mEt7M2_dmGbsNg9SoEIQnEpBeTDYZLbPMM47mpRmhnCO-YzROhpP4aSqnLXa398IAQCU-A58Oq7X8fJmVRJX1lCCWAhv0A5UmtVdr9-0IKnWiWDULl2Ggey_9hxFpt6TfXPdjA5UKPwZHbLR7ci0beffLrfWzz19DGf_7ases--3U4897DDphLSg6bNw3BcfCjpcb4EvHiWbdcFPkHEFg_cHx_ghQmBD-BosVn1e8AvAFicK9DSYNODndef0X3HfZZPD42h96zb4J3jwK0q0nImuxizHW6DwXgZNOa6UzQ6PdnIBUGotFTwIanA7TJBMI85k1UhpIYpUH4pS1i2UBZ4yLRCorUioVHLY2uY6iHHsuh8CH0Yb4nHUoFrNVPRpj1oTh4u_Tl-yQ8lFLtK5Ye7su4RoxfWtvqmR-AbihoJg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1KPehJpRW_3YNHk6bZ3SR7EimWqk0RbKG3spvMYrGkpU0O-uudTVJF8eAtBPLBTMKbefveLCHXXS9SYReYwwIQDrc2GSmSxFHGzktTLDTG8h3xKBhM-ONUTBvk5ssLAwCl-Axce1iu5afLpLBUWSdklqXABn1HcM5F5dbafj3MFjs-D-uly64nOy-9u9iqt4RbX_ljC5USQfr7JN4-uxKOvLlFrt3k49dYxv--3AFpf3v16PMXCh2SBmQtMuqpjGJpR4sN0KWhlmjdUJWlFGFg_U7x_ghRmBL6CosVnZfMAtCFlYU7G0wbUOt1p9V_cNsmk_79uDdw6p0TnLnvRbnDfK2xj1FayTRlnhFGylAmyg53MwwioTSWPQFIMLIbBQlDoE-0EkJBwMPUY0ekmS0zOCaUBSLULLLFgsHmJpW-n2LXZRD6MNrAT0jLxmK2qoZjzOownP59-orsDsbxcDZ8GD2dkT2bm0qwdU6a-bqAC0T4XF-Wif0EUz6j5Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+15th+International+Working+Conference+on+Source+Code+Analysis+and+Manipulation+%28SCAM%29&rft.atitle=Can+the+use+of+types+and+query+expansion+help+improve+large-scale+code+search%3F&rft.au=Lazzarini+Lemos%2C+Otavio+Augusto&rft.au=de+Paula%2C+Adriano+Carvalho&rft.au=Sajnani%2C+Hitesh&rft.au=Lopes%2C+Cristina+V.&rft.date=2015-09-01&rft.pub=IEEE&rft.spage=41&rft.epage=50&rft_id=info:doi/10.1109%2FSCAM.2015.7335400&rft.externalDocID=7335400