Feature-Level Decomposition of Text Complexity: Cross-Domain Empirical Evidence

This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features - lexical diversity, density, syntactic complexity, coherence, named entities, and readability - achieving Spearman correlations of 0.55...

Full description

Saved in:

Bibliographic Details
Published in	Computer science journal of Moldova Vol. 33; no. 2(98); pp. 257 - 280
Main Authors	Parahonco, Alexandr, Parahonco, Liudmila
Format	Journal Article
Language	English
Published	Vladimir Andrunachievici Institute of Mathematics and Computer Science 01.09.2025
Subjects	domain dependency feature decomposition large language models prompting strategy spearman correlation text complexity
Online Access	Get full text
ISSN	1561-4042 2587-4330 1561-4042 2587-4330
DOI	10.56415/csjm.v33.13

Cover

More Information
Summary:	This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features - lexical diversity, density, syntactic complexity, coherence, named entities, and readability - achieving Spearman correlations of 0.55-0.60 across domains. Phase II employed indirect prompting to surface additional qualitative dimensions (e.g., inferential load, rhetorical structure), yielding a mean correlation of 0.42 and revealing that the six features account for 40% of complexity variance. Domain dependencies were limited to named entities and lexical diversity. We propose a hybrid model combining normalization, root-based synergies, and newly quantified metrics with domain-tuned formulae for improved prediction.
ISSN:	1561-4042 2587-4330 1561-4042 2587-4330
DOI:	10.56415/csjm.v33.13