Feature-Level Decomposition of Text Complexity: Cross-Domain Empirical Evidence
This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features - lexical diversity, density, syntactic complexity, coherence, named entities, and readability - achieving Spearman correlations of 0.55...
Saved in:
| Published in | Computer science journal of Moldova Vol. 33; no. 2(98); pp. 257 - 280 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Vladimir Andrunachievici Institute of Mathematics and Computer Science
01.09.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1561-4042 2587-4330 1561-4042 2587-4330 |
| DOI | 10.56415/csjm.v33.13 |
Cover
| Summary: | This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features - lexical diversity, density, syntactic complexity, coherence, named entities, and readability - achieving Spearman correlations of 0.55-0.60 across domains. Phase II employed indirect prompting to surface additional qualitative dimensions (e.g., inferential load, rhetorical structure), yielding a mean correlation of 0.42 and revealing that the six features account for 40% of complexity variance. Domain dependencies were limited to named entities and lexical diversity. We propose a hybrid model combining normalization, root-based synergies, and newly quantified metrics with domain-tuned formulae for improved prediction. |
|---|---|
| ISSN: | 1561-4042 2587-4330 1561-4042 2587-4330 |
| DOI: | 10.56415/csjm.v33.13 |