빠른 응답을 위한 규칙 기반 무인 커피 음성 주문 시스템

최근 드라이브 스루 환경에서 주문 자동화를 위하여 LLM (Large Language Model)을 활용한 연구가 진행되고 있으나, LLM은 높은 계산과 네트워크 비용이 요구된다. 커피 주문과 같은 제한된 언어 환경에서는 규칙 기반 시스템을 통하여 LLM과 유사한 정확도와 빠른 처리 속도를 제공할 수 있다. 본 논문에서는 지식 베이스와 추론 기관으로 구성된 규칙 기반 커피 음성 주문 시스템을 제안하며, LLM 시스템과의 성능을 비교한다. 본 연구에서 사용된 커피 주문 시스템은 총 540가지의 조합(메뉴 9종, 개수 1부터 10, 크...

Full description

Saved in:

Bibliographic Details
Published in	정보과학회 컴퓨팅의 실제 논문지, 31(4) pp. 172 - 178
Main Authors	양수빈, 김민태, 김동환, 정준섭, 최명재, 김학재, 이성주
Format	Journal Article
Language	Korean
Published	한국정보과학회 01.04.2025
Subjects	컴퓨터학
Online Access	Get full text
ISSN	2383-6318 2383-6326

Cover

More Information
Summary:	최근 드라이브 스루 환경에서 주문 자동화를 위하여 LLM (Large Language Model)을 활용한 연구가 진행되고 있으나, LLM은 높은 계산과 네트워크 비용이 요구된다. 커피 주문과 같은 제한된 언어 환경에서는 규칙 기반 시스템을 통하여 LLM과 유사한 정확도와 빠른 처리 속도를 제공할 수 있다. 본 논문에서는 지식 베이스와 추론 기관으로 구성된 규칙 기반 커피 음성 주문 시스템을 제안하며, LLM 시스템과의 성능을 비교한다. 본 연구에서 사용된 커피 주문 시스템은 총 540가지의 조합(메뉴 9종, 개수 1부터 10, 크기 3가지, 온도 2가지)으로 구성된 제한된 언어 환경을 기반으로 하였다. 실험 결과, 제안방법과 LLM 기반 방법은 정확도는 각각 98%, 80%, 93%, 처리 속도는 각각 6.79×10-5초, 3.86×100, 1.70×100초로, 제안방법은 LLM과 유사한 정확도를 제공하면서 최대 약 5만 배 빠른 속도를 제공함을 확인한다. 또한, 음성 주문을 위한 여섯 가지 ASR(Automatic Speech Recognition) 모델의 성능을 비교하였을 때, Google Recognition Speech 모델이 평균 0.97초, 0.15의 CER(Character Error Rate)로 가장 우수한 성능을 제공함을 확인한다. Recent research has explored the use of Large Language Models (LLMs) to automate orders in drive-thru environments. However, LLMs require high computational and network costs. In limited language environments such as coffee orders, rule-based systems can provide accuracy similar to that of LLMs while offering a faster processing speed. In this paper, we proposed a rule-based coffee speech ordering system composed of a knowledge base and inference engine instead of LLMs and compared the proposed method’s performance with LLMs. The coffee ordering system used in this study was based on a restricted language environment consisting of a total of 540 combinations (9 types of menu items, quantity from 1 to 10, 3 sizes, and 2 temperature options). Experimental results demonstrated that the proposed method achieved accuracy rates 98%, 80%, and 93%, respectively, with processing time of 6.79×10⁻⁵ seconds, 3.86×10⁰ seconds, and 1.70×10⁰ seconds, respectively. These results confirm that the proposed method could provide comparable accuracy to LLM-based approaches while delivering a processing speed up to approximately 50,000 times faster. In addition, we compared accuracy rates and speed performances of six Automatic Speech Recognition (ASR) models for Voice Ordering System to integrate the proposed method. Results confirmed that the Google Speech Recognition model provided the best performance, with an average processing time of 0.97 seconds and a Character Error Rate (CER) of 0.15. KCI Citation Count: 0
ISSN:	2383-6318 2383-6326