Respondent Simulation and Construct Validation: A Framework for Developing LLM Privacy Surveys Under Data Scarcity
With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation o...
Saved in:
| Published in | 2025 IEEE International Conference on Intelligence and Security Informatics (ISI) pp. 70 - 76 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
12.07.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2837-6617 |
| DOI | 10.1109/ISI65680.2025.11201143 |
Cover
| Summary: | With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation of user privacy concern scales. This study aims to propose a structured, intelligent, and theoretically robust method to rapidly generate the LLM privacy concern scale and validate its effectiveness. The study first combines the theoretical framework of structural equation modeling (SEM) with the BRASS instruction-driven mechanism to construct a set of instruction tools for automatic generation of questionnaire items, followed by the construction of high ecological validity simulated subject group characteristics using social psychology and risk perception theories. Based on the theory-driven scoring function and path model, the study further generates simulated response data and embeds a human-computer collaborative mechanism of LLM self-checking and expert review to ensure the quality and theoretical consistency of the generated data. Preliminary validation results using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) show that the generated simulated data and scales have good internal consistency and structural validity. The results confirm that the method is capable of providing rapid and cost-effective theoretical conceptual validation in the absence of real data, which improves the efficiency and methodological rigor of research in the area of LLM privacy and security, and provides an important tool to support subsequent large-scale empirical research. |
|---|---|
| ISSN: | 2837-6617 |
| DOI: | 10.1109/ISI65680.2025.11201143 |