Respondent Simulation and Construct Validation: A Framework for Developing LLM Privacy Surveys Under Data Scarcity

With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation o...

Full description

Saved in:

Bibliographic Details
Published in	2025 IEEE International Conference on Intelligence and Security Informatics (ISI) pp. 70 - 76
Main Authors	Meng, Xiuzhe, Li, Lexin
Format	Conference Proceeding
Language	English
Published	IEEE 12.07.2025
Subjects	Data collection Data models Data privacy Large-scale language modeling Mathematical models Privacy privacy security Psychology questionnaire generation Reliability theory Security simulated data Solids Surveys
Online Access	Get full text
ISSN	2837-6617
DOI	10.1109/ISI65680.2025.11201143

Cover

Abstract	With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation of user privacy concern scales. This study aims to propose a structured, intelligent, and theoretically robust method to rapidly generate the LLM privacy concern scale and validate its effectiveness. The study first combines the theoretical framework of structural equation modeling (SEM) with the BRASS instruction-driven mechanism to construct a set of instruction tools for automatic generation of questionnaire items, followed by the construction of high ecological validity simulated subject group characteristics using social psychology and risk perception theories. Based on the theory-driven scoring function and path model, the study further generates simulated response data and embeds a human-computer collaborative mechanism of LLM self-checking and expert review to ensure the quality and theoretical consistency of the generated data. Preliminary validation results using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) show that the generated simulated data and scales have good internal consistency and structural validity. The results confirm that the method is capable of providing rapid and cost-effective theoretical conceptual validation in the absence of real data, which improves the efficiency and methodological rigor of research in the area of LLM privacy and security, and provides an important tool to support subsequent large-scale empirical research.
AbstractList	With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation of user privacy concern scales. This study aims to propose a structured, intelligent, and theoretically robust method to rapidly generate the LLM privacy concern scale and validate its effectiveness. The study first combines the theoretical framework of structural equation modeling (SEM) with the BRASS instruction-driven mechanism to construct a set of instruction tools for automatic generation of questionnaire items, followed by the construction of high ecological validity simulated subject group characteristics using social psychology and risk perception theories. Based on the theory-driven scoring function and path model, the study further generates simulated response data and embeds a human-computer collaborative mechanism of LLM self-checking and expert review to ensure the quality and theoretical consistency of the generated data. Preliminary validation results using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) show that the generated simulated data and scales have good internal consistency and structural validity. The results confirm that the method is capable of providing rapid and cost-effective theoretical conceptual validation in the absence of real data, which improves the efficiency and methodological rigor of research in the area of LLM privacy and security, and provides an important tool to support subsequent large-scale empirical research.
Author	Li, Lexin Meng, Xiuzhe
Author_xml	– sequence: 1 givenname: Xiuzhe surname: Meng fullname: Meng, Xiuzhe email: xiuzhe.meng@ia.ac.cn organization: Tianjin University,Department of Management and Economics,Tianjin,China – sequence: 2 givenname: Lexin surname: Li fullname: Li, Lexin email: l12lxn@nefu.edu.cn organization: School of Computer and Control Engineering, Northeast Forestry University,Harbin,China
BookMark	eNqFjs1Kw0AUhUfRRdW-gch9gdb5MZPEnVSLhRaKUbflktzKYDITbiaRvL1BdO3qwPk-DudCnPngSYgbJZdKyfx2U2xsYjO51FInU6WlUnfmRMzzNM-MUYnSqU1PxUxnJl1Yq9KZ4Bfq2uAr8hEK1_Q1Rhc8oK9gFXwXuS8jvGPtqh9wDw-wZmzoK_AnHAPDIw1Uh9b5D9hud7BnN2A5QtHzQGMHb9P0JGFEKErk0sXxSpwfse5o_puX4nr99Lp6XjgiOrTsGuTx8Hff_IO_AWdBTRE
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ISI65680.2025.11201143
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplorer IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Psychology
EISBN	9798331512767
EISSN	2837-6617
EndPage	76
ExternalDocumentID	11201143
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 72293575 funderid: 10.13039/501100001809
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-ieee_primary_112011433
IEDL.DBID	RIE
IngestDate	Wed Oct 29 06:13:01 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-ieee_primary_112011433
ParticipantIDs	ieee_primary_11201143
PublicationCentury	2000
PublicationDate	2025-July-12
PublicationDateYYYYMMDD	2025-07-12
PublicationDate_xml	– month: 07 year: 2025 text: 2025-July-12 day: 12
PublicationDecade	2020
PublicationTitle	2025 IEEE International Conference on Intelligence and Security Informatics (ISI)
PublicationTitleAbbrev	ISI
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	3.8359911
Snippet	With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread...
SourceID	ieee
SourceType	Publisher
StartPage	70
SubjectTerms	Data collection Data models Data privacy Large-scale language modeling Mathematical models Privacy privacy security Psychology questionnaire generation Reliability theory Security simulated data Solids Surveys
Title	Respondent Simulation and Construct Validation: A Framework for Developing LLM Privacy Surveys Under Data Scarcity
URI	https://ieeexplore.ieee.org/document/11201143
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5sTz35qvioMgevSfNquvEm1tJKW8So9FaSzQSKmJaSFuqvd3bTRBQFb0tYsrsM7HyzM998ANdOakmOZS0jVR0IvUSmRtAl30i7Hckej0SsuTDjiT948R6mnemOrK65MESki8_IVEOdy08Wcq2eytqMDRR-d2tQ6wq_IGvtWL-2FbSH4ZDRibA46nM6Zjn5m2yK9hr9fZiU6xXFIm_mOo9N-fGjFeO_N3QAzS-CHj5WrucQ9ig7gkZ1mW2PYfWki18VDRfD-ftOpAujLEGl0am7xuIrg_BCU-kGb7Ff1mkhA1nsVWwqHI3GvNp8E8kthuvVho2PWi8Je1EeocrhSAbzTWj175_vBoY6wWxZtLGYlZt3T6CeLTI6BYwTiq2AMYuwyUuIhEpuUyzI5UglEvIMmr_-4vyP7xfQULZQb6G204I6H44u2Ynn8ZU23idXCqLu
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4oHuTkq8YH6hy8thTYQvFmRFK0ECNouJF2O00IsRgCJPjrnd0-jEYTb5tm093NJDvf7Mw3H8B1PbYlx7K2GasOhCKSsdluUdOMW45kj0duqLkw_UHTexEPY2eckdU1F4aIdPEZWWqoc_nRXK7UU1mVsYHC741t2HGEEE5K18p4vzW7Xe0Ne4xPXJvjvrpj5dO_Cadov9Hdg0G-YlouMrNWy9CSHz-aMf57S_tgfFH08KlwPgewRckhlIvrbHMEi2dd_qqIuDicvmUyXRgkESqVTt03Fl8ZhqeqSjd4i928UgsZymKn4FOh7_d5tek6kBscrhZrNj9qxSTsBMsAVRZHMpw3oNK9H915pjrB5D1tZDHJN984hlIyT-gEMIwotNuMWtwaiYjIVeltCl1qcKwSuPIUjF9_cfbH9yvY9UZ9f-L3Bo_nUFZ2US-jtXoFSnxQumCXvgwvtSE_AQg1pjs
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+IEEE+International+Conference+on+Intelligence+and+Security+Informatics+%28ISI%29&rft.atitle=Respondent+Simulation+and+Construct+Validation%3A+A+Framework+for+Developing+LLM+Privacy+Surveys+Under+Data+Scarcity&rft.au=Meng%2C+Xiuzhe&rft.au=Li%2C+Lexin&rft.date=2025-07-12&rft.pub=IEEE&rft.eissn=2837-6617&rft.spage=70&rft.epage=76&rft_id=info:doi/10.1109%2FISI65680.2025.11201143&rft.externalDocID=11201143