Framework for evaluating code generation ability of large language models

Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granul...

Full description

Saved in:
Bibliographic Details
Published inETRI journal Vol. 46; no. 1; pp. 106 - 117
Main Authors Yeo, Sangyeop, Ma, Yu‐Seung, Kim, Sang Cheol, Jun, Hyungkook, Kim, Taeho
Format Journal Article
LanguageEnglish
Published Electronics and Telecommunications Research Institute (ETRI) 01.02.2024
한국전자통신연구원
Subjects
Online AccessGet full text
ISSN1225-6463
2233-7326
DOI10.4218/etrij.2023-0357

Cover

Abstract Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric.
AbstractList Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, , which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the metric.
Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass-ratio@n metric.
Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass-ratio@n metric. KCI Citation Count: 0
Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric.
Author Ma, Yu‐Seung
Kim, Sang Cheol
Yeo, Sangyeop
Kim, Taeho
Jun, Hyungkook
Author_xml – sequence: 1
  givenname: Sangyeop
  surname: Yeo
  fullname: Yeo, Sangyeop
  organization: University of Science and Technology
– sequence: 2
  givenname: Yu‐Seung
  orcidid: 0000-0002-4168-5515
  surname: Ma
  fullname: Ma, Yu‐Seung
  email: ysma@etri.re.kr
  organization: Electronics and Telecommunications Research Institute
– sequence: 3
  givenname: Sang Cheol
  orcidid: 0000-0002-1925-2588
  surname: Kim
  fullname: Kim, Sang Cheol
  organization: Electronics and Telecommunications Research Institute
– sequence: 4
  givenname: Hyungkook
  surname: Jun
  fullname: Jun, Hyungkook
  organization: Electronics and Telecommunications Research Institute
– sequence: 5
  givenname: Taeho
  orcidid: 0000-0002-5061-206X
  surname: Kim
  fullname: Kim, Taeho
  organization: Electronics and Telecommunications Research Institute
BackLink https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003054740$$DAccess content in National Research Foundation of Korea (NRF)
BookMark eNqFkc1r3DAQxUVJIZu05159LjiRRh-WjiEkzUIgEDZnMZYlo12vVWSnYf_7ar3JJVB60Uji_d6M9C7I2ZhGT8gPRq8EMH3t5xy3V0CB15TL5gtZAXBeNxzUGVkxAFkrofg5uZimLaVAhdQrsr7PuPdvKe-qkHLl_-DwinMc-8qlzle9H30u5zRW2MYhzocqhWrA3Puyjv0rls2-KIfpG_kacJj89_d6SV7u7za3D_Xj06_17c1j7QSnpu6AdthQFbDTRjtE7cpYKF3njQjKaRkcOMdaA0ZK1QSpoTFNKG9xzmvgl-TnyXfMwe5ctAnjUvtkd9nePG_WllFBQTFdxOuTuEu4tb9z3GM-LMRykXJvMc_RDd5SJkwnRCdaMEI0WntJVcu4UtxQbEPxkicvl9M0ZR-si_PyN3PGOJSm9hiEXYKwxyDsMYjCXX_iPub4N6FOxFsc_OF_cnu3eQYGShj-F0qHnjI
CitedBy_id crossref_primary_10_4218_etr2_12666
crossref_primary_10_1145_3714464
crossref_primary_10_3390_app142110048
Cites_doi 10.4218/etrij.2019‐0396
10.3115/1073083.1073135
10.4218/etrij.2021‐0269
10.4218/etrij.2020‐0282
10.1126/science.abq1158
10.1109/COMPSAC57700.2023.00117
10.1109/ICSE48619.2023.00035
10.1145/3558489.3559072
10.1145/3524842.3528470
10.1145/3580305.3599790
ContentType Journal Article
Copyright 1225‐6463/$ © 2024 ETRI
Copyright_xml – notice: 1225‐6463/$ © 2024 ETRI
DBID AAYXX
CITATION
DOA
ACYCR
DOI 10.4218/etrij.2023-0357
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
Korean Citation Index
DatabaseTitle CrossRef
DatabaseTitleList CrossRef



Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2233-7326
EndPage 117
ExternalDocumentID oai_kci_go_kr_ARTI_10402618
oai_doaj_org_article_0149d44d4b2944788e506b1366390abf
10_4218_etrij_2023_0357
ETR212649
Genre article
GrantInformation_xml – fundername: National Research Council of Science & Technology (NST)
  funderid: Global‐23‐001
– fundername: Institute of Information & communications Technology Planning & Evaluation
  funderid: 2022‐0‐00995
GroupedDBID -~X
.4S
.DC
.UV
0R~
1OC
29G
2WC
5GY
5VS
9ZL
AAKPC
AAYBS
ACGFS
ACXQS
ACYCR
ADBBV
ADDVE
AENEX
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVUZU
BCNDV
DU5
E3Z
EBS
EDO
EJD
GROUPED_DOAJ
IPNFZ
ITG
ITH
JDI
KQ8
KVFHK
MK~
ML~
O9-
OK1
P5Y
RIG
RNS
TR2
TUS
WIN
XSB
AAYXX
ADMLS
CITATION
OVT
AAMMB
AEFGJ
AGXDD
AIDQK
AIDYY
ID FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823
IEDL.DBID DOA
ISSN 1225-6463
IngestDate Sat Mar 02 03:21:41 EST 2024
Wed Aug 27 01:30:50 EDT 2025
Thu Apr 24 23:00:24 EDT 2025
Tue Jul 01 02:03:21 EDT 2025
Wed Jan 22 16:14:26 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823
Notes Funding information
This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (2022‐0‐00995, automated reliable source code generation from natural language descriptions, 95%) and a National Research Council of Science & Technology (NST) grant (Global‐23‐001, SeCode: Collaborative intelligent model for secure program code generator, 5%) funded by the Korea government (MSIT).
https://doi.org/10.4218/etrij.2023-0357
ORCID 0000-0002-1925-2588
0000-0002-4168-5515
0000-0002-5061-206X
OpenAccessLink https://doaj.org/article/0149d44d4b2944788e506b1366390abf
PageCount 12
ParticipantIDs nrf_kci_oai_kci_go_kr_ARTI_10402618
doaj_primary_oai_doaj_org_article_0149d44d4b2944788e506b1366390abf
crossref_citationtrail_10_4218_etrij_2023_0357
crossref_primary_10_4218_etrij_2023_0357
wiley_primary_10_4218_etrij_2023_0357_ETR212649
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate February 2024
2024-02-00
2024-02-01
2024-02
PublicationDateYYYYMMDD 2024-02-01
PublicationDate_xml – month: 02
  year: 2024
  text: February 2024
PublicationDecade 2020
PublicationTitle ETRI journal
PublicationYear 2024
Publisher Electronics and Telecommunications Research Institute (ETRI)
한국전자통신연구원
Publisher_xml – name: Electronics and Telecommunications Research Institute (ETRI)
– name: 한국전자통신연구원
References 2021; 43
2022; 44
2002
2023
2022
2021
2020
2019; 32
2022; 378
e_1_2_9_20_1
e_1_2_9_11_1
e_1_2_9_10_1
e_1_2_9_21_1
e_1_2_9_13_1
e_1_2_9_12_1
e_1_2_9_8_1
e_1_2_9_7_1
e_1_2_9_6_1
e_1_2_9_5_1
e_1_2_9_4_1
e_1_2_9_3_1
e_1_2_9_2_1
e_1_2_9_9_1
e_1_2_9_15_1
e_1_2_9_14_1
e_1_2_9_17_1
e_1_2_9_16_1
e_1_2_9_19_1
e_1_2_9_18_1
References_xml – start-page: 5673
  year: 2023
  end-page: 5684
– year: 2022
– year: 2021
– year: 2020
– volume: 378
  start-page: 1092
  issue: 6624
  year: 2022
  end-page: 1097
  article-title: Competition‐level code generation with alphacode
  publication-title: Sci.
– start-page: 283
  year: 2023
  end-page: 294
– year: 2023
– start-page: 876
  year: 2023
  end-page: 885
– start-page: 62
  year: 2022
  end-page: 71
– volume: 44
  start-page: 794
  issue: 5
  year: 2022
  end-page: 804
  article-title: Comparative study of text representation and learning for persian named entity recognition
  publication-title: ETRI J.
– volume: 43
  start-page: 1038
  issue: 6
  year: 2021
  end-page: 1048
  article-title: Simple and effective neural coreference resolution for korean language
  publication-title: ETRI J.
– volume: 44
  start-page: 413
  issue: 3
  year: 2022
  end-page: 425
  article-title: Automatic extraction of similar poetry for study of literary texts: An experiment on hindi poetry
  publication-title: ETRI J.
– volume: 32
  year: 2019
– start-page: 311
  year: 2002
  end-page: 318
– start-page: 1
  year: 2022
  end-page: 5
– ident: e_1_2_9_4_1
  doi: 10.4218/etrij.2019‐0396
– ident: e_1_2_9_21_1
– ident: e_1_2_9_10_1
– ident: e_1_2_9_5_1
– ident: e_1_2_9_8_1
  doi: 10.3115/1073083.1073135
– ident: e_1_2_9_2_1
  doi: 10.4218/etrij.2021‐0269
– ident: e_1_2_9_3_1
  doi: 10.4218/etrij.2020‐0282
– ident: e_1_2_9_17_1
– ident: e_1_2_9_20_1
– ident: e_1_2_9_6_1
  doi: 10.1126/science.abq1158
– ident: e_1_2_9_11_1
  doi: 10.1109/COMPSAC57700.2023.00117
– ident: e_1_2_9_7_1
– ident: e_1_2_9_14_1
  doi: 10.1109/ICSE48619.2023.00035
– ident: e_1_2_9_15_1
– ident: e_1_2_9_13_1
  doi: 10.1145/3558489.3559072
– ident: e_1_2_9_18_1
– ident: e_1_2_9_19_1
– ident: e_1_2_9_9_1
– ident: e_1_2_9_12_1
  doi: 10.1145/3524842.3528470
– ident: e_1_2_9_16_1
  doi: 10.1145/3580305.3599790
SSID ssj0020458
Score 2.4146042
Snippet Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code....
SourceID nrf
doaj
crossref
wiley
SourceType Open Website
Enrichment Source
Index Database
Publisher
StartPage 106
SubjectTerms code generation
evaluation metric
large language model
natural language processing
software engineering
전자/정보통신공학
Title Framework for evaluating code generation ability of large language models
URI https://onlinelibrary.wiley.com/doi/abs/10.4218%2Fetrij.2023-0357
https://doaj.org/article/0149d44d4b2944788e506b1366390abf
https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003054740
Volume 46
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX ETRI Journal, 2024, 46(1), , pp.106-117
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELUQEwyIT1G-ZAkGllA3sR17BETVIsGAWonNchy74kMpKmXg33PnJFUZUBemSFGcOO-svHex_Y6QCyulAOEbElXqHBIUz5Ii4z6xtvDAFpr5gJuTHx7lYMzvn8XzUqkvXBNW2wPXwHVRwpecl7xINUevdy-YLHoZMKVmtgj49WWatclUk2rh9B-mWjBaE8llVpv6cOCzLhaqer3CquEJy5CVlvgo2vYDy1Sz8FusRrbpb5OtRibS67p7O2TNV7tkc8k8cI8M--26KgrCk7a23dWE4jZ1Ool-0gg7ra24v-k00Hdc-E3bn5Q01sH53Cfj_t3odpA0hRESx3FGpExZaXMmgy2VVs5a5eAlrXCl1zxIp0RwqXO9ArSPEDIPQqW5zkOaZc55lWYHZL2aVv6Q0DKXogdhdBp4SuQWa48zFlJULkzlvEOuWniMa1zDsXjFu4HsAfE0EU-DeBrEs0MuFw0-asOMvy-9QbwXl6HTdTwB8TdN_M2q-HfIOUTLvLmX2B6Pk6l5mxnIB4bwZI5JpuqQbozmqi6Zu9ETkLnk-ug_OndMNuDWvF7lfULW57MvfwoiZl6cxfH6A3nq6Kk
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4V9gA9oD5ALPRhqT30EsgmfuW4rVjttsCh2kWIi-U49oqCsigsh_77zjjZFVRCVU-RItuxZzz5Zvz4BuCzlVKg4xsSXRUKAxSfJmXOfWJt6REtitQHupx8di7HM_79Ulw-ugvT8kOsF9zIMuL_mgycFqTJyjnCEmlx2Vz_OqL030maC7UBPYF4irO8N7yYXc3WYRdtBVLYhTM3kVzmLcEPNXL8VxNPsClS-CPi1E146rhG5Bm9gp3OZWTDVsev4YWv38DLR0SCb2EyWp2xYuiEshWFdz1ndGWdzSO3NKmAtbTcv9kisFs6BM5WC5Ys5sS534XZ6GT6bZx0SRISx2l3pMrSyqpUBlvpQjtrtcNBWuEqX_AgnRbBZc4NSvSDhJAqCJ2pQoUsz53zOsv3YLNe1H4fWKWkGKBKXYGYJZSlPORpGjLyYlKteB-OVuIxrmMQp0QWtwYjCZKnifI0JE9D8uzDl3WFu5Y84_miX0ne62LEeh1fLJq56YzIUDhXcV7xMis48f571Hg5yCV10JahD59QW-bGXcf69JwvzE1jMDaY4Jc5BZy6D8dRm__qkjmZ_kRgl7w4-O8aH2FrPD07NaeT8x-HsI0leHvU-x1sLpsH_x49mWX5oZuqfwD4COqr
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB5BKyE4VDzF0gKW4MAlrTfxK8cCXXV5VAh1EeJiOX6sSqtsFZYD_74zTrKiSAhxihTZTjLjyTdjj78BeOmUkuj4psKEWmOAEnnRVCIWzjUR0aLmMdHh5I8n6ngh3n2VYzYhnYXp-SE2C25kGfl_TQZ-GRIZuUBUIiWuu7Pv-1T9u-CV1DdhG8FcYPy1ffhl8W2xibpoJ5CiLpy4hRKq6vl9aJCDP4a4Bk2ZwR8Bp-3Sdb81A8_sLuwMHiM77FV8D27E9j7c-Y1H8AHMZ2OKFUMflI0M3u2S0Yl1tszU0qQB1rNy_2KrxC4oB5yN65Usl8T58RAWs6PTN8fFUCOh8II2R0LJg9NcJRdMbbxzxuNHOulDrEVS3sjkS--nDbpBUiqdpCl1rVNZVd5HU1aPYKtdtfExsKCVnKJGfY2QJbWjMuScp5KcGG60mMD-KB7rBwJxqmNxYTGQIHnaLE9L8rQkzwm82nS47Lkz_t70Ncl704xIr_ONVbe0gw1ZiuaCEEE0ZS2I9j9KrppppegFXZMm8AK1Zc_9We5P1-XKnncWQ4M5PllQvGkmcJC1-a9XskennxHXlaif_HeP53Dr09uZ_TA_eb8Lt7GB6BO992Br3f2MT9GPWTfPhpl6BY-b6dQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Framework+for+evaluating+code+generation+ability+of+large+language+models&rft.jtitle=ETRI+journal&rft.au=%EC%97%AC%EC%83%81%EC%97%BD&rft.au=%EB%A7%88%EC%9C%A0%EC%8A%B9&rft.au=%EA%B9%80%EC%83%81%EC%B2%A0&rft.au=%EC%A0%84%ED%98%95%EA%B5%AD&rft.date=2024-02-01&rft.pub=%ED%95%9C%EA%B5%AD%EC%A0%84%EC%9E%90%ED%86%B5%EC%8B%A0%EC%97%B0%EA%B5%AC%EC%9B%90&rft.issn=1225-6463&rft.eissn=2233-7326&rft.spage=106&rft.epage=117&rft_id=info:doi/10.4218%2Fetrij.2023-0357&rft.externalDBID=n%2Fa&rft.externalDocID=oai_kci_go_kr_ARTI_10402618
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1225-6463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1225-6463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1225-6463&client=summon