Framework for evaluating code generation ability of large language models

Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granul...

Full description

Saved in:

Bibliographic Details
Published in	ETRI journal Vol. 46; no. 1; pp. 106 - 117
Main Authors	Yeo, Sangyeop, Ma, Yu‐Seung, Kim, Sang Cheol, Jun, Hyungkook, Kim, Taeho
Format	Journal Article
Language	English
Published	Electronics and Telecommunications Research Institute (ETRI) 01.02.2024 한국전자통신연구원
Subjects	code generation evaluation metric large language model natural language processing software engineering 전자/정보통신공학
Online Access	Get full text
ISSN	1225-6463 2233-7326
DOI	10.4218/etrij.2023-0357

Cover

More Information
Summary:	Large language models (LLMs) have revolutionized various applications in natural language processing and exhibited proficiency in generating programming code. We propose a framework for evaluating the code generation ability of LLMs and introduce a new metric, pass‐ratio@n, which captures the granularity of accuracy according to the pass rate of test cases. The framework is intended to be fully automatic to handle the repetitive work involved in generating prompts, conducting inferences, and executing the generated codes. A preliminary evaluation focusing on the prompt detail, problem publication date, and difficulty level demonstrates the successful integration of our framework with the LeetCode coding platform and highlights the applicability of the pass‐ratio@n metric.
Bibliography:	Funding information This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (2022‐0‐00995, automated reliable source code generation from natural language descriptions, 95%) and a National Research Council of Science & Technology (NST) grant (Global‐23‐001, SeCode: Collaborative intelligent model for secure program code generator, 5%) funded by the Korea government (MSIT). https://doi.org/10.4218/etrij.2023-0357
ISSN:	1225-6463 2233-7326
DOI:	10.4218/etrij.2023-0357