Respondent Simulation and Construct Validation: A Framework for Developing LLM Privacy Surveys Under Data Scarcity
With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation o...
Saved in:
| Published in | 2025 IEEE International Conference on Intelligence and Security Informatics (ISI) pp. 70 - 76 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
12.07.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2837-6617 |
| DOI | 10.1109/ISI65680.2025.11201143 |
Cover
| Abstract | With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation of user privacy concern scales. This study aims to propose a structured, intelligent, and theoretically robust method to rapidly generate the LLM privacy concern scale and validate its effectiveness. The study first combines the theoretical framework of structural equation modeling (SEM) with the BRASS instruction-driven mechanism to construct a set of instruction tools for automatic generation of questionnaire items, followed by the construction of high ecological validity simulated subject group characteristics using social psychology and risk perception theories. Based on the theory-driven scoring function and path model, the study further generates simulated response data and embeds a human-computer collaborative mechanism of LLM self-checking and expert review to ensure the quality and theoretical consistency of the generated data. Preliminary validation results using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) show that the generated simulated data and scales have good internal consistency and structural validity. The results confirm that the method is capable of providing rapid and cost-effective theoretical conceptual validation in the absence of real data, which improves the efficiency and methodological rigor of research in the area of LLM privacy and security, and provides an important tool to support subsequent large-scale empirical research. |
|---|---|
| AbstractList | With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread attention. Existing studies generally face the problem of difficult access to real user data, which hinders the effective development and validation of user privacy concern scales. This study aims to propose a structured, intelligent, and theoretically robust method to rapidly generate the LLM privacy concern scale and validate its effectiveness. The study first combines the theoretical framework of structural equation modeling (SEM) with the BRASS instruction-driven mechanism to construct a set of instruction tools for automatic generation of questionnaire items, followed by the construction of high ecological validity simulated subject group characteristics using social psychology and risk perception theories. Based on the theory-driven scoring function and path model, the study further generates simulated response data and embeds a human-computer collaborative mechanism of LLM self-checking and expert review to ensure the quality and theoretical consistency of the generated data. Preliminary validation results using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) show that the generated simulated data and scales have good internal consistency and structural validity. The results confirm that the method is capable of providing rapid and cost-effective theoretical conceptual validation in the absence of real data, which improves the efficiency and methodological rigor of research in the area of LLM privacy and security, and provides an important tool to support subsequent large-scale empirical research. |
| Author | Li, Lexin Meng, Xiuzhe |
| Author_xml | – sequence: 1 givenname: Xiuzhe surname: Meng fullname: Meng, Xiuzhe email: xiuzhe.meng@ia.ac.cn organization: Tianjin University,Department of Management and Economics,Tianjin,China – sequence: 2 givenname: Lexin surname: Li fullname: Li, Lexin email: l12lxn@nefu.edu.cn organization: School of Computer and Control Engineering, Northeast Forestry University,Harbin,China |
| BookMark | eNqFjs1Kw0AUhUfRRdW-gch9gdb5MZPEnVSLhRaKUbflktzKYDITbiaRvL1BdO3qwPk-DudCnPngSYgbJZdKyfx2U2xsYjO51FInU6WlUnfmRMzzNM-MUYnSqU1PxUxnJl1Yq9KZ4Bfq2uAr8hEK1_Q1Rhc8oK9gFXwXuS8jvGPtqh9wDw-wZmzoK_AnHAPDIw1Uh9b5D9hud7BnN2A5QtHzQGMHb9P0JGFEKErk0sXxSpwfse5o_puX4nr99Lp6XjgiOrTsGuTx8Hff_IO_AWdBTRE |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ISI65680.2025.11201143 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplorer IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Psychology |
| EISBN | 9798331512767 |
| EISSN | 2837-6617 |
| EndPage | 76 |
| ExternalDocumentID | 11201143 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 72293575 funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-ieee_primary_112011433 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 29 06:13:01 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-ieee_primary_112011433 |
| ParticipantIDs | ieee_primary_11201143 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-July-12 |
| PublicationDateYYYYMMDD | 2025-07-12 |
| PublicationDate_xml | – month: 07 year: 2025 text: 2025-July-12 day: 12 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 IEEE International Conference on Intelligence and Security Informatics (ISI) |
| PublicationTitleAbbrev | ISI |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 3.8359911 |
| Snippet | With the widespread application of Large Language Models (LLMs), the privacy and security issues arising from LLMs have gradually received widespread... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 70 |
| SubjectTerms | Data collection Data models Data privacy Large-scale language modeling Mathematical models Privacy privacy security Psychology questionnaire generation Reliability theory Security simulated data Solids Surveys |
| Title | Respondent Simulation and Construct Validation: A Framework for Developing LLM Privacy Surveys Under Data Scarcity |
| URI | https://ieeexplore.ieee.org/document/11201143 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5sTz35qvioMgevSfNquvEm1tJKW8So9FaSzQSKmJaSFuqvd3bTRBQFb0tYsrsM7HyzM998ANdOakmOZS0jVR0IvUSmRtAl30i7Hckej0SsuTDjiT948R6mnemOrK65MESki8_IVEOdy08Wcq2eytqMDRR-d2tQ6wq_IGvtWL-2FbSH4ZDRibA46nM6Zjn5m2yK9hr9fZiU6xXFIm_mOo9N-fGjFeO_N3QAzS-CHj5WrucQ9ig7gkZ1mW2PYfWki18VDRfD-ftOpAujLEGl0am7xuIrg_BCU-kGb7Ff1mkhA1nsVWwqHI3GvNp8E8kthuvVho2PWi8Je1EeocrhSAbzTWj175_vBoY6wWxZtLGYlZt3T6CeLTI6BYwTiq2AMYuwyUuIhEpuUyzI5UglEvIMmr_-4vyP7xfQULZQb6G204I6H44u2Ynn8ZU23idXCqLu |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4oHuTkq8YH6hy8thTYQvFmRFK0ECNouJF2O00IsRgCJPjrnd0-jEYTb5tm093NJDvf7Mw3H8B1PbYlx7K2GasOhCKSsdluUdOMW45kj0duqLkw_UHTexEPY2eckdU1F4aIdPEZWWqoc_nRXK7UU1mVsYHC741t2HGEEE5K18p4vzW7Xe0Ne4xPXJvjvrpj5dO_Cadov9Hdg0G-YlouMrNWy9CSHz-aMf57S_tgfFH08KlwPgewRckhlIvrbHMEi2dd_qqIuDicvmUyXRgkESqVTt03Fl8ZhqeqSjd4i928UgsZymKn4FOh7_d5tek6kBscrhZrNj9qxSTsBMsAVRZHMpw3oNK9H915pjrB5D1tZDHJN984hlIyT-gEMIwotNuMWtwaiYjIVeltCl1qcKwSuPIUjF9_cfbH9yvY9UZ9f-L3Bo_nUFZ2US-jtXoFSnxQumCXvgwvtSE_AQg1pjs |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+IEEE+International+Conference+on+Intelligence+and+Security+Informatics+%28ISI%29&rft.atitle=Respondent+Simulation+and+Construct+Validation%3A+A+Framework+for+Developing+LLM+Privacy+Surveys+Under+Data+Scarcity&rft.au=Meng%2C+Xiuzhe&rft.au=Li%2C+Lexin&rft.date=2025-07-12&rft.pub=IEEE&rft.eissn=2837-6617&rft.spage=70&rft.epage=76&rft_id=info:doi/10.1109%2FISI65680.2025.11201143&rft.externalDocID=11201143 |