A Study for Language Models as Agents for Strategic Decision-making Environments

Language Models (LMs) have proven highly effective in reasoning, understanding, and decision-making tasks. However, Large Language Models (LLMs) like GPT-4 face challenges when deployed in real-time complex environments, including high pricecosts and the need for extensive prompt engineering. These...

Full description

Saved in:

Bibliographic Details
Published in	Inteonet jeongbo hakoe nonmunji = Journal of Korean Society for Internet Information Vol. 26; no. 1; pp. 157 - 169
Main Authors	오지환, Jihwan Oh, 윤세영, Se-young Yun
Format	Journal Article
Language	Korean
Published	한국인터넷정보학회 28.02.2025
Subjects	decision-making instruction fine-tuning language model decision-making language model instruction fine-tuning
Online Access	Get full text
ISSN	1598-0170 2287-1136
DOI	10.7472/jksii.2025.26.1.157

Cover

More Information
Summary:	Language Models (LMs) have proven highly effective in reasoning, understanding, and decision-making tasks. However, Large Language Models (LLMs) like GPT-4 face challenges when deployed in real-time complex environments, including high pricecosts and the need for extensive prompt engineering. These limitations make it difficult to use LLMs in dynamic and adaptive systems like robotics, autonomous agents, and strategic decision-making environments. To address these challenges, we propose a framework that utilizes Small Language Models (SLMs) as agents in strategic decision-making environments. By employing fine-tuning methods such as supervised fine-tuning (SFT) and instruction fine-tuning (IFT) on reasoning datasets, we enhance the capabilities of smaller models like Llama-3.1-8B to handle multi-step decision processes efficiently. This approach significantly reduces computational overhead while maintaining high performance without the need for prompt engineering. Our experiments, conducted in a battlefield environment, StarCraft Multi-Agent Challenge (SMAC), show that fine-tuned SLMs outperform LLMs in terms of both performance and efficiency. Notably, SLMs performed well in real-time scenarios and handled unseen data effectively, proving their robustness in dynamic environments. The elimination of prompt engineering further simplifies their use, making SLMs a practical, scalable alternative to LLMs for real-time decision-making.
Bibliography:	Korean Society for Internet Information KISTI1.1003/JNL.JAKO202509439605810
ISSN:	1598-0170 2287-1136
DOI:	10.7472/jksii.2025.26.1.157