역사 자료 형태소 분석 말뭉치 프로그램 개발 및 고도화

This paper aims to provide a detailed explanation of the development and enhancement process of the historical data morpheme analysis program ‘UTagger-Hunminjeongeum’. In this paper, we introduce the morpheme analysis algorithm of ‘UTagger-Hunminjeongeum (ver. 0.9)’ and outline the steps taken to im...

Full description

Saved in:
Bibliographic Details
Published in언어와 정보 사회 Vol. 54; pp. 191 - 219
Main Authors 장요한, Jang Yohan, 옥철영, Ock Choelyoung, 신승용, Shin Seungyong, 박시온, Park Sion
Format Journal Article
LanguageKorean
Published 서강대학교 언어정보연구소 31.03.2025
언어정보연구소
Subjects
Online AccessGet full text
ISSN1598-1886
2713-6817
DOI10.29211/soli.2025.54..007

Cover

More Information
Summary:This paper aims to provide a detailed explanation of the development and enhancement process of the historical data morpheme analysis program ‘UTagger-Hunminjeongeum’. In this paper, we introduce the morpheme analysis algorithm of ‘UTagger-Hunminjeongeum (ver. 0.9)’ and outline the steps taken to improve it. Additionally, we present the structure of the tagging tool ‘UTagger-Hunminjeongeum TCM’, which was independently developed to reduce manual errors and save time. This tool is used to create a small-scale morpheme-analyzed corpus for training ‘UTagger-Hunminjeongeum’. The paper also discusses the enhancements made after the training phase, such as converting Chinese character tagging into Hangul and tagging intonation markers (bangjeom). The program has achieved an accuracy rate of nearly 90% for trained data and over 80% for untrained data, with an overall accuracy rate ranging from 85% to 90%. With continued development and the inclusion of more diverse data, the program is expected to become a versatile and highly accurate morphological analysis tool.
Bibliography:Language and Information Institute Sogang University
ISSN:1598-1886
2713-6817
DOI:10.29211/soli.2025.54..007