Text corpus for natural language story-telling sentence generation: A design and evaluation

Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedur...

Full description

Saved in:
Bibliographic Details
Published inJCSSE 2014 : 2014 11th International Joint Conference on Computer Science and Software Engineering : Chonburi, Thailand, May 14-16, 2014 pp. 80 - 85
Main Authors Limpanadusadee, Worasa, Punyabukkana, Proadpran, Suchato, Atiwong, Poobrasert, Onintra
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2014
Subjects
Online AccessGet full text
DOI10.1109/JCSSE.2014.6841846

Cover

More Information
Summary:Automatic generation of narrative sentences from unordered word sets is desirable in Augmentative and Alternative Communication (AAC) systems for children with certain learning disabilities (LD). Regardless of the complexity of the Natural Language Processing deployed in sentence generation procedures, the qualities of language models always affect the generation results. This work compared sentence generation accuracies obtained from a multi-tier N-gram-based procedure trained on BEST2010, a large publicly available text corpus, and a smaller but more specifically designed corpus in the task of Thai simple sentence generation. The latter, a new corpus called TELL-S, was created based on an analysis of the contents belonging to textbooks used in grade 1 and grade 2 for Thai language subjects according to the compulsory curriculum for Thai schools. The original procedure was also modified to incorporate additional constraints based on a story-telling guideline developed for LD children. Evaluated upon test sets of 195 sentences, each of which was composed of 3-6 words with a specific Part-Of-Speech combination, TELL-S was shown to provide better generalization and yielded higher accuracies than BEST2010 in all cases with unbiased word sets. The sentence generation accuracies were 100% and 70% for 3-word and 4-word sentences, respectively. The average accuracy was at 58.8% when longer sentences were also included.
DOI:10.1109/JCSSE.2014.6841846