Framework for evaluating code generation ability of large language models
Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granul...
Saved in:
Published in | ETRI journal Vol. 46; no. 1; pp. 106 - 117 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Electronics and Telecommunications Research Institute (ETRI)
01.02.2024
한국전자통신연구원 |
Subjects | |
Online Access | Get full text |
ISSN | 1225-6463 2233-7326 |
DOI | 10.4218/etrij.2023-0357 |
Cover
Abstract | Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric,
pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the
pass‐ratio@n metric. |
---|---|
AbstractList | Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric,
, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the
metric. Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass-ratio@n metric. Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass-ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass-ratio@n metric. KCI Citation Count: 0 Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric. |
Author | Ma, Yu‐Seung Kim, Sang Cheol Yeo, Sangyeop Kim, Taeho Jun, Hyungkook |
Author_xml | – sequence: 1 givenname: Sangyeop surname: Yeo fullname: Yeo, Sangyeop organization: University of Science and Technology – sequence: 2 givenname: Yu‐Seung orcidid: 0000-0002-4168-5515 surname: Ma fullname: Ma, Yu‐Seung email: ysma@etri.re.kr organization: Electronics and Telecommunications Research Institute – sequence: 3 givenname: Sang Cheol orcidid: 0000-0002-1925-2588 surname: Kim fullname: Kim, Sang Cheol organization: Electronics and Telecommunications Research Institute – sequence: 4 givenname: Hyungkook surname: Jun fullname: Jun, Hyungkook organization: Electronics and Telecommunications Research Institute – sequence: 5 givenname: Taeho orcidid: 0000-0002-5061-206X surname: Kim fullname: Kim, Taeho organization: Electronics and Telecommunications Research Institute |
BackLink | https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003054740$$DAccess content in National Research Foundation of Korea (NRF) |
BookMark | eNqFkc1r3DAQxUVJIZu05159LjiRRh-WjiEkzUIgEDZnMZYlo12vVWSnYf_7ar3JJVB60Uji_d6M9C7I2ZhGT8gPRq8EMH3t5xy3V0CB15TL5gtZAXBeNxzUGVkxAFkrofg5uZimLaVAhdQrsr7PuPdvKe-qkHLl_-DwinMc-8qlzle9H30u5zRW2MYhzocqhWrA3Puyjv0rls2-KIfpG_kacJj89_d6SV7u7za3D_Xj06_17c1j7QSnpu6AdthQFbDTRjtE7cpYKF3njQjKaRkcOMdaA0ZK1QSpoTFNKG9xzmvgl-TnyXfMwe5ctAnjUvtkd9nePG_WllFBQTFdxOuTuEu4tb9z3GM-LMRykXJvMc_RDd5SJkwnRCdaMEI0WntJVcu4UtxQbEPxkicvl9M0ZR-si_PyN3PGOJSm9hiEXYKwxyDsMYjCXX_iPub4N6FOxFsc_OF_cnu3eQYGShj-F0qHnjI |
CitedBy_id | crossref_primary_10_4218_etr2_12666 crossref_primary_10_1145_3714464 crossref_primary_10_3390_app142110048 |
Cites_doi | 10.4218/etrij.2019‐0396 10.3115/1073083.1073135 10.4218/etrij.2021‐0269 10.4218/etrij.2020‐0282 10.1126/science.abq1158 10.1109/COMPSAC57700.2023.00117 10.1109/ICSE48619.2023.00035 10.1145/3558489.3559072 10.1145/3524842.3528470 10.1145/3580305.3599790 |
ContentType | Journal Article |
Copyright | 1225‐6463/$ © 2024 ETRI |
Copyright_xml | – notice: 1225‐6463/$ © 2024 ETRI |
DBID | AAYXX CITATION DOA ACYCR |
DOI | 10.4218/etrij.2023-0357 |
DatabaseName | CrossRef DOAJ Directory of Open Access Journals Korean Citation Index |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2233-7326 |
EndPage | 117 |
ExternalDocumentID | oai_kci_go_kr_ARTI_10402618 oai_doaj_org_article_0149d44d4b2944788e506b1366390abf 10_4218_etrij_2023_0357 ETR212649 |
Genre | article |
GrantInformation_xml | – fundername: National Research Council of Science & Technology (NST) funderid: Global‐23‐001 – fundername: Institute of Information & communications Technology Planning & Evaluation funderid: 2022‐0‐00995 |
GroupedDBID | -~X .4S .DC .UV 0R~ 1OC 29G 2WC 5GY 5VS 9ZL AAKPC AAYBS ACGFS ACXQS ACYCR ADBBV ADDVE AENEX ALMA_UNASSIGNED_HOLDINGS ARCSS AVUZU BCNDV DU5 E3Z EBS EDO EJD GROUPED_DOAJ IPNFZ ITG ITH JDI KQ8 KVFHK MK~ ML~ O9- OK1 P5Y RIG RNS TR2 TUS WIN XSB AAYXX ADMLS CITATION OVT AAMMB AEFGJ AGXDD AIDQK AIDYY |
ID | FETCH-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823 |
IEDL.DBID | DOA |
ISSN | 1225-6463 |
IngestDate | Sat Mar 02 03:21:41 EST 2024 Wed Aug 27 01:30:50 EDT 2025 Thu Apr 24 23:00:24 EDT 2025 Tue Jul 01 02:03:21 EDT 2025 Wed Jan 22 16:14:26 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c4309-d20da706fad898caa8c646a5cde94f6c85fc2cc1b9295567f582797f233cce823 |
Notes | Funding information This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (2022‐0‐00995, automated reliable source code generation from natural language descriptions, 95%) and a National Research Council of Science & Technology (NST) grant (Global‐23‐001, SeCode: Collaborative intelligent model for secure program code generator, 5%) funded by the Korea government (MSIT). https://doi.org/10.4218/etrij.2023-0357 |
ORCID | 0000-0002-1925-2588 0000-0002-4168-5515 0000-0002-5061-206X |
OpenAccessLink | https://doaj.org/article/0149d44d4b2944788e506b1366390abf |
PageCount | 12 |
ParticipantIDs | nrf_kci_oai_kci_go_kr_ARTI_10402618 doaj_primary_oai_doaj_org_article_0149d44d4b2944788e506b1366390abf crossref_citationtrail_10_4218_etrij_2023_0357 crossref_primary_10_4218_etrij_2023_0357 wiley_primary_10_4218_etrij_2023_0357_ETR212649 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | February 2024 2024-02-00 2024-02-01 2024-02 |
PublicationDateYYYYMMDD | 2024-02-01 |
PublicationDate_xml | – month: 02 year: 2024 text: February 2024 |
PublicationDecade | 2020 |
PublicationTitle | ETRI journal |
PublicationYear | 2024 |
Publisher | Electronics and Telecommunications Research Institute (ETRI) 한국전자통신연구원 |
Publisher_xml | – name: Electronics and Telecommunications Research Institute (ETRI) – name: 한국전자통신연구원 |
References | 2021; 43 2022; 44 2002 2023 2022 2021 2020 2019; 32 2022; 378 e_1_2_9_20_1 e_1_2_9_11_1 e_1_2_9_10_1 e_1_2_9_21_1 e_1_2_9_13_1 e_1_2_9_12_1 e_1_2_9_8_1 e_1_2_9_7_1 e_1_2_9_6_1 e_1_2_9_5_1 e_1_2_9_4_1 e_1_2_9_3_1 e_1_2_9_2_1 e_1_2_9_9_1 e_1_2_9_15_1 e_1_2_9_14_1 e_1_2_9_17_1 e_1_2_9_16_1 e_1_2_9_19_1 e_1_2_9_18_1 |
References_xml | – start-page: 5673 year: 2023 end-page: 5684 – year: 2022 – year: 2021 – year: 2020 – volume: 378 start-page: 1092 issue: 6624 year: 2022 end-page: 1097 article-title: Competition‐level code generation with alphacode publication-title: Sci. – start-page: 283 year: 2023 end-page: 294 – year: 2023 – start-page: 876 year: 2023 end-page: 885 – start-page: 62 year: 2022 end-page: 71 – volume: 44 start-page: 794 issue: 5 year: 2022 end-page: 804 article-title: Comparative study of text representation and learning for persian named entity recognition publication-title: ETRI J. – volume: 43 start-page: 1038 issue: 6 year: 2021 end-page: 1048 article-title: Simple and effective neural coreference resolution for korean language publication-title: ETRI J. – volume: 44 start-page: 413 issue: 3 year: 2022 end-page: 425 article-title: Automatic extraction of similar poetry for study of literary texts: An experiment on hindi poetry publication-title: ETRI J. – volume: 32 year: 2019 – start-page: 311 year: 2002 end-page: 318 – start-page: 1 year: 2022 end-page: 5 – ident: e_1_2_9_4_1 doi: 10.4218/etrij.2019‐0396 – ident: e_1_2_9_21_1 – ident: e_1_2_9_10_1 – ident: e_1_2_9_5_1 – ident: e_1_2_9_8_1 doi: 10.3115/1073083.1073135 – ident: e_1_2_9_2_1 doi: 10.4218/etrij.2021‐0269 – ident: e_1_2_9_3_1 doi: 10.4218/etrij.2020‐0282 – ident: e_1_2_9_17_1 – ident: e_1_2_9_20_1 – ident: e_1_2_9_6_1 doi: 10.1126/science.abq1158 – ident: e_1_2_9_11_1 doi: 10.1109/COMPSAC57700.2023.00117 – ident: e_1_2_9_7_1 – ident: e_1_2_9_14_1 doi: 10.1109/ICSE48619.2023.00035 – ident: e_1_2_9_15_1 – ident: e_1_2_9_13_1 doi: 10.1145/3558489.3559072 – ident: e_1_2_9_18_1 – ident: e_1_2_9_19_1 – ident: e_1_2_9_9_1 – ident: e_1_2_9_12_1 doi: 10.1145/3524842.3528470 – ident: e_1_2_9_16_1 doi: 10.1145/3580305.3599790 |
SSID | ssj0020458 |
Score | 2.4146042 |
Snippet | Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code.... |
SourceID | nrf doaj crossref wiley |
SourceType | Open Website Enrichment Source Index Database Publisher |
StartPage | 106 |
SubjectTerms | code generation evaluation metric large language model natural language processing software engineering 전자/정보통신공학 |
Title | Framework for evaluating code generation ability of large language models |
URI | https://onlinelibrary.wiley.com/doi/abs/10.4218%2Fetrij.2023-0357 https://doaj.org/article/0149d44d4b2944788e506b1366390abf https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART003054740 |
Volume | 46 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
ispartofPNX | ETRI Journal, 2024, 46(1), , pp.106-117 |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELUQEwyIT1G-ZAkGllA3sR17BETVIsGAWonNchy74kMpKmXg33PnJFUZUBemSFGcOO-svHex_Y6QCyulAOEbElXqHBIUz5Ii4z6xtvDAFpr5gJuTHx7lYMzvn8XzUqkvXBNW2wPXwHVRwpecl7xINUevdy-YLHoZMKVmtgj49WWatclUk2rh9B-mWjBaE8llVpv6cOCzLhaqer3CquEJy5CVlvgo2vYDy1Sz8FusRrbpb5OtRibS67p7O2TNV7tkc8k8cI8M--26KgrCk7a23dWE4jZ1Ool-0gg7ra24v-k00Hdc-E3bn5Q01sH53Cfj_t3odpA0hRESx3FGpExZaXMmgy2VVs5a5eAlrXCl1zxIp0RwqXO9ArSPEDIPQqW5zkOaZc55lWYHZL2aVv6Q0DKXogdhdBp4SuQWa48zFlJULkzlvEOuWniMa1zDsXjFu4HsAfE0EU-DeBrEs0MuFw0-asOMvy-9QbwXl6HTdTwB8TdN_M2q-HfIOUTLvLmX2B6Pk6l5mxnIB4bwZI5JpuqQbozmqi6Zu9ETkLnk-ug_OndMNuDWvF7lfULW57MvfwoiZl6cxfH6A3nq6Kk |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4V9gA9oD5ALPRhqT30EsgmfuW4rVjttsCh2kWIi-U49oqCsigsh_77zjjZFVRCVU-RItuxZzz5Zvz4BuCzlVKg4xsSXRUKAxSfJmXOfWJt6REtitQHupx8di7HM_79Ulw-ugvT8kOsF9zIMuL_mgycFqTJyjnCEmlx2Vz_OqL030maC7UBPYF4irO8N7yYXc3WYRdtBVLYhTM3kVzmLcEPNXL8VxNPsClS-CPi1E146rhG5Bm9gp3OZWTDVsev4YWv38DLR0SCb2EyWp2xYuiEshWFdz1ndGWdzSO3NKmAtbTcv9kisFs6BM5WC5Ys5sS534XZ6GT6bZx0SRISx2l3pMrSyqpUBlvpQjtrtcNBWuEqX_AgnRbBZc4NSvSDhJAqCJ2pQoUsz53zOsv3YLNe1H4fWKWkGKBKXYGYJZSlPORpGjLyYlKteB-OVuIxrmMQp0QWtwYjCZKnifI0JE9D8uzDl3WFu5Y84_miX0ne62LEeh1fLJq56YzIUDhXcV7xMis48f571Hg5yCV10JahD59QW-bGXcf69JwvzE1jMDaY4Jc5BZy6D8dRm__qkjmZ_kRgl7w4-O8aH2FrPD07NaeT8x-HsI0leHvU-x1sLpsH_x49mWX5oZuqfwD4COqr |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB5BKyE4VDzF0gKW4MAlrTfxK8cCXXV5VAh1EeJiOX6sSqtsFZYD_74zTrKiSAhxihTZTjLjyTdjj78BeOmUkuj4psKEWmOAEnnRVCIWzjUR0aLmMdHh5I8n6ngh3n2VYzYhnYXp-SE2C25kGfl_TQZ-GRIZuUBUIiWuu7Pv-1T9u-CV1DdhG8FcYPy1ffhl8W2xibpoJ5CiLpy4hRKq6vl9aJCDP4a4Bk2ZwR8Bp-3Sdb81A8_sLuwMHiM77FV8D27E9j7c-Y1H8AHMZ2OKFUMflI0M3u2S0Yl1tszU0qQB1rNy_2KrxC4oB5yN65Usl8T58RAWs6PTN8fFUCOh8II2R0LJg9NcJRdMbbxzxuNHOulDrEVS3sjkS--nDbpBUiqdpCl1rVNZVd5HU1aPYKtdtfExsKCVnKJGfY2QJbWjMuScp5KcGG60mMD-KB7rBwJxqmNxYTGQIHnaLE9L8rQkzwm82nS47Lkz_t70Ncl704xIr_ONVbe0gw1ZiuaCEEE0ZS2I9j9KrppppegFXZMm8AK1Zc_9We5P1-XKnncWQ4M5PllQvGkmcJC1-a9XskennxHXlaif_HeP53Dr09uZ_TA_eb8Lt7GB6BO992Br3f2MT9GPWTfPhpl6BY-b6dQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Framework+for+evaluating+code+generation+ability+of+large+language+models&rft.jtitle=ETRI+journal&rft.au=%EC%97%AC%EC%83%81%EC%97%BD&rft.au=%EB%A7%88%EC%9C%A0%EC%8A%B9&rft.au=%EA%B9%80%EC%83%81%EC%B2%A0&rft.au=%EC%A0%84%ED%98%95%EA%B5%AD&rft.date=2024-02-01&rft.pub=%ED%95%9C%EA%B5%AD%EC%A0%84%EC%9E%90%ED%86%B5%EC%8B%A0%EC%97%B0%EA%B5%AC%EC%9B%90&rft.issn=1225-6463&rft.eissn=2233-7326&rft.spage=106&rft.epage=117&rft_id=info:doi/10.4218%2Fetrij.2023-0357&rft.externalDBID=n%2Fa&rft.externalDocID=oai_kci_go_kr_ARTI_10402618 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1225-6463&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1225-6463&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1225-6463&client=summon |