Respondent Simulation and Construct Validation: A Framework for Developing LLM Privacy Surveys Under Data Scarcity

With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation o...

Full description

Saved in:

Bibliographic Details
Published in	2025 IEEE International Conference on Intelligence and Security Informatics (ISI) pp. 70 - 76
Main Authors	Meng, Xiuzhe, Li, Lexin
Format	Conference Proceeding
Language	English
Published	IEEE 12.07.2025
Subjects	Data collection Data models Data privacy Large-scale language modeling Mathematical models Privacy privacy security Psychology questionnaire generation Reliability theory Security simulated data Solids Surveys
Online Access	Get full text
ISSN	2837-6617
DOI	10.1109/ISI65680.2025.11201143

Cover

More Information
Summary:	With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation of user privacy concern scales. This study aims to propose a structured, intelligent, and theoretically robust method to rapidly generate the LLM privacy concern scale and validate its effectiveness. The study first combines the theoretical framework of structural equation modeling (SEM) with the BRASS instruction-driven mechanism to construct a set of instruction tools for automatic generation of questionnaire items, followed by the construction of high ecological validity simulated subject group characteristics using social psychology and risk perception theories. Based on the theory-driven scoring function and path model, the study further generates simulated response data and embeds a human-computer collaborative mechanism of LLM self-checking and expert review to ensure the quality and theoretical consistency of the generated data. Preliminary validation results using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) show that the generated simulated data and scales have good internal consistency and structural validity. The results confirm that the method is capable of providing rapid and cost-effective theoretical conceptual validation in the absence of real data, which improves the efficiency and methodological rigor of research in the area of LLM privacy and security, and provides an important tool to support subsequent large-scale empirical research.
ISSN:	2837-6617
DOI:	10.1109/ISI65680.2025.11201143