역사 자료 형태소 분석 말뭉치 프로그램 개발 및 고도화
This paper aims to provide a detailed explanation of the development and enhancement process of the historical data morpheme analysis program ‘UTagger-Hunminjeongeum’. In this paper, we introduce the morpheme analysis algorithm of ‘UTagger-Hunminjeongeum (ver. 0.9)’ and outline the steps taken to im...
        Saved in:
      
    
          | Published in | 언어와 정보 사회 Vol. 54; pp. 191 - 219 | 
|---|---|
| Main Authors | , , , , , , , | 
| Format | Journal Article | 
| Language | Korean | 
| Published | 
            서강대학교 언어정보연구소
    
        31.03.2025
     언어정보연구소  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1598-1886 2713-6817  | 
| DOI | 10.29211/soli.2025.54..007 | 
Cover
| Summary: | This paper aims to provide a detailed explanation of the development and enhancement process of the historical data morpheme analysis program ‘UTagger-Hunminjeongeum’. In this paper, we introduce the morpheme analysis algorithm of ‘UTagger-Hunminjeongeum (ver. 0.9)’ and outline the steps taken to improve it. Additionally, we present the structure of the tagging tool ‘UTagger-Hunminjeongeum TCM’, which was independently developed to reduce manual errors and save time. This tool is used to create a small-scale morpheme-analyzed corpus for training ‘UTagger-Hunminjeongeum’. The paper also discusses the enhancements made after the training phase, such as converting Chinese character tagging into Hangul and tagging intonation markers (bangjeom). The program has achieved an accuracy rate of nearly 90% for trained data and over 80% for untrained data, with an overall accuracy rate ranging from 85% to 90%. With continued development and the inclusion of more diverse data, the program is expected to become a versatile and highly accurate morphological analysis tool. | 
|---|---|
| Bibliography: | Language and Information Institute Sogang University | 
| ISSN: | 1598-1886 2713-6817  | 
| DOI: | 10.29211/soli.2025.54..007 |