Can the use of types and query expansion help improve large-scale code search?
With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainl...
Saved in:
| Published in | 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM) pp. 41 - 50 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.09.2015
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/SCAM.2015.7335400 |
Cover
| Abstract | With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function. |
|---|---|
| AbstractList | With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function. |
| Author | Sajnani, Hitesh de Paula, Adriano Carvalho Lopes, Cristina V. Lazzarini Lemos, Otavio Augusto |
| Author_xml | – sequence: 1 givenname: Otavio Augusto surname: Lazzarini Lemos fullname: Lazzarini Lemos, Otavio Augusto email: otavio.lemos@unifesp.br organization: Science and Technology Department - Federal University of São Paulo at S. J. dos Campos - Brazil – sequence: 2 givenname: Adriano Carvalho surname: de Paula fullname: de Paula, Adriano Carvalho email: adriano.carvalho@unifesp.br organization: Science and Technology Department - Federal University of São Paulo at S. J. dos Campos - Brazil – sequence: 3 givenname: Hitesh surname: Sajnani fullname: Sajnani, Hitesh organization: Donald Bren School of Information and Computer Sciences - University of California, Irvine - USA – sequence: 4 givenname: Cristina V. surname: Lopes fullname: Lopes, Cristina V. email: lopes@ics.uci.edu organization: Donald Bren School of Information and Computer Sciences - University of California, Irvine - USA |
| BookMark | eNotz71OwzAUQGEjgQQtfQDE4hdI8E9sxxOqIihIBQZgrq6da2IpTYKdIvr2DHQ62yedBTkfxgEJueGs5JzZu_dm_VIKxlVppFQVY2dkwSttpFHCikuyyjk6JhmzVlTmirw2MNC5Q3rISMdA5-OEmcLQ0u8DpiPF3wmGHMeBdthPNO6nNP4g7SF9YZE99Ej92CLNCMl399fkIkCfcXXqknw-Pnw0T8X2bfPcrLdFFKyeCymcs1qDA9u2kgUVrDXWg6pqFiTWCpwWUqPFYHmtvRS89g6UAtSVaZlcktt_NyLibkpxD-m4Oy3LP2B9TmM |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/SCAM.2015.7335400 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1467375292 9781467375290 |
| EndPage | 50 |
| ExternalDocumentID | 7335400 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ALMA_UNASSIGNED_HOLDINGS CBEJK RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i208t-32bb966aba9dd30f5f9979ca5480f3e85ab6236e9ef9186c3218cba55ae647d03 |
| IEDL.DBID | RIE |
| IngestDate | Wed Dec 20 05:19:44 EST 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i208t-32bb966aba9dd30f5f9979ca5480f3e85ab6236e9ef9186c3218cba55ae647d03 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_7335400 |
| PublicationCentury | 2000 |
| PublicationDate | 20150901 |
| PublicationDateYYYYMMDD | 2015-09-01 |
| PublicationDate_xml | – month: 09 year: 2015 text: 20150901 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM) |
| PublicationTitleAbbrev | SCAM |
| PublicationYear | 2015 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib030099247 |
| Score | 1.6746546 |
| Snippet | With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 41 |
| SubjectTerms | code search Context Java Natural languages query expansion Search engines Semantics software reuse Thesauri |
| Title | Can the use of types and query expansion help improve large-scale code search? |
| URI | https://ieeexplore.ieee.org/document/7335400 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1qT55UWvGbPXg0aZLNZrMnkWIpQoughd7KbjKLxZKWtjnor3cmSSuKB28hkA9mEt7M2_dmGbsNg9SoEIQnEpBeTDYZLbPMM47mpRmhnCO-YzROhpP4aSqnLXa398IAQCU-A58Oq7X8fJmVRJX1lCCWAhv0A5UmtVdr9-0IKnWiWDULl2Ggey_9hxFpt6TfXPdjA5UKPwZHbLR7ci0beffLrfWzz19DGf_7ases--3U4897DDphLSg6bNw3BcfCjpcb4EvHiWbdcFPkHEFg_cHx_ghQmBD-BosVn1e8AvAFicK9DSYNODndef0X3HfZZPD42h96zb4J3jwK0q0nImuxizHW6DwXgZNOa6UzQ6PdnIBUGotFTwIanA7TJBMI85k1UhpIYpUH4pS1i2UBZ4yLRCorUioVHLY2uY6iHHsuh8CH0Yb4nHUoFrNVPRpj1oTh4u_Tl-yQ8lFLtK5Ye7su4RoxfWtvqmR-AbihoJg |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1KPehJpRW_3YNHk6bZ3SR7EimWqk0RbKG3spvMYrGkpU0O-uudTVJF8eAtBPLBTMKbefveLCHXXS9SYReYwwIQDrc2GSmSxFHGzktTLDTG8h3xKBhM-ONUTBvk5ssLAwCl-Axce1iu5afLpLBUWSdklqXABn1HcM5F5dbafj3MFjs-D-uly64nOy-9u9iqt4RbX_ljC5USQfr7JN4-uxKOvLlFrt3k49dYxv--3AFpf3v16PMXCh2SBmQtMuqpjGJpR4sN0KWhlmjdUJWlFGFg_U7x_ghRmBL6CosVnZfMAtCFlYU7G0wbUOt1p9V_cNsmk_79uDdw6p0TnLnvRbnDfK2xj1FayTRlnhFGylAmyg53MwwioTSWPQFIMLIbBQlDoE-0EkJBwMPUY0ekmS0zOCaUBSLULLLFgsHmJpW-n2LXZRD6MNrAT0jLxmK2qoZjzOownP59-orsDsbxcDZ8GD2dkT2bm0qwdU6a-bqAC0T4XF-Wif0EUz6j5Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+15th+International+Working+Conference+on+Source+Code+Analysis+and+Manipulation+%28SCAM%29&rft.atitle=Can+the+use+of+types+and+query+expansion+help+improve+large-scale+code+search%3F&rft.au=Lazzarini+Lemos%2C+Otavio+Augusto&rft.au=de+Paula%2C+Adriano+Carvalho&rft.au=Sajnani%2C+Hitesh&rft.au=Lopes%2C+Cristina+V.&rft.date=2015-09-01&rft.pub=IEEE&rft.spage=41&rft.epage=50&rft_id=info:doi/10.1109%2FSCAM.2015.7335400&rft.externalDocID=7335400 |