Feature-Level Decomposition of Text Complexity: Cross-Domain Empirical Evidence

This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features - lexical diversity, density, syntactic complexity, coherence, named entities, and readability - achieving Spearman correlations of 0.55...

Full description

Saved in:
Bibliographic Details
Published inComputer science journal of Moldova Vol. 33; no. 2(98); pp. 257 - 280
Main Authors Parahonco, Alexandr, Parahonco, Liudmila
Format Journal Article
LanguageEnglish
Published Vladimir Andrunachievici Institute of Mathematics and Computer Science 01.09.2025
Subjects
Online AccessGet full text
ISSN1561-4042
2587-4330
1561-4042
2587-4330
DOI10.56415/csjm.v33.13

Cover

More Information
Summary:This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features - lexical diversity, density, syntactic complexity, coherence, named entities, and readability - achieving Spearman correlations of 0.55-0.60 across domains. Phase II employed indirect prompting to surface additional qualitative dimensions (e.g., inferential load, rhetorical structure), yielding a mean correlation of 0.42 and revealing that the six features account for 40% of complexity variance. Domain dependencies were limited to named entities and lexical diversity. We propose a hybrid model combining normalization, root-based synergies, and newly quantified metrics with domain-tuned formulae for improved prediction.
ISSN:1561-4042
2587-4330
1561-4042
2587-4330
DOI:10.56415/csjm.v33.13