Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy
Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliabilit...
Saved in:
| Published in | Journal of methods and measurement in the social sciences Vol. 13; no. 1 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
University of Arizona Libraries
01.10.2022
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2159-7855 2159-7855 |
| DOI | 10.2458/jmmss.5397 |
Cover
| Abstract | Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliability. This study compares recent advances in Natural Language Processing (NLP) models with human coding. This web-based study (N=1,253; 3,046 free-text entries, averaging 64 characters per entry) included people with Duchenne Muscular Dystrophy (DMD), their siblings, and a representative comparison group. Human raters (n=6) were trained over multiple sessions in content analysis as per a comprehensive codebook. Three prompts addressed distinct aspects of participants’ aspirations. Unsupervised NLP was implemented using Latent Dirichlet Allocation (LDA), which extracts latent topics across all the free-text entries. Supervised NLP was done using a Bidirectional Encoder Representations from Transformers (BERT) model, which requires training the algorithm to recognize relevant human-coded themes across free-text entries. We compared the human-, LDA-, and BERT-coded themes. Study sample contained 286 people with DMD, 355 DMD siblings, and 997 comparison participants, age 8-69. Human coders generated 95 codes across the three prompts and had an average inter-rater reliability (Fleiss’s kappa) of 0.77, with minimal rater-effect (pseudo R2=4%). Compared to human coders, LDA does not yield easily interpretable themes. BERT correctly classified only 61-70% of the validation set. LDA and BERT required technical expertise to program and took approximately 1.15 minutes per open-text entry, compared to 1.18 minutes for human raters including training time. LDA and BERT provide potentially viable approaches to analyzing large-scale qualitative data, but both have limitations. When text entries are short, LDA yields latent topics that are hard to interpret. BERT accurately identified only about two thirds of new statements. Humans provided reliable and cost-effective coding in the web-based context. The upfront training enables BERT to process enormous quantities of text data in future work, which should examine NLP’s predictive accuracy given different quantities of training data. |
|---|---|
| AbstractList | Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study participants who have not been broadly studied. However, qualitative data require time and effort to train raters to achieve validity and reliability. This study compares recent advances in Natural Language Processing (NLP) models with human coding. This web-based study (N=1,253; 3,046 free-text entries, averaging 64 characters per entry) included people with Duchenne Muscular Dystrophy (DMD), their siblings, and a representative comparison group. Human raters (n=6) were trained over multiple sessions in content analysis as per a comprehensive codebook. Three prompts addressed distinct aspects of participants’ aspirations. Unsupervised NLP was implemented using Latent Dirichlet Allocation (LDA), which extracts latent topics across all the free-text entries. Supervised NLP was done using a Bidirectional Encoder Representations from Transformers (BERT) model, which requires training the algorithm to recognize relevant human-coded themes across free-text entries. We compared the human-, LDA-, and BERT-coded themes. Study sample contained 286 people with DMD, 355 DMD siblings, and 997 comparison participants, age 8-69. Human coders generated 95 codes across the three prompts and had an average inter-rater reliability (Fleiss’s kappa) of 0.77, with minimal rater-effect (pseudo R2=4%). Compared to human coders, LDA does not yield easily interpretable themes. BERT correctly classified only 61-70% of the validation set. LDA and BERT required technical expertise to program and took approximately 1.15 minutes per open-text entry, compared to 1.18 minutes for human raters including training time. LDA and BERT provide potentially viable approaches to analyzing large-scale qualitative data, but both have limitations. When text entries are short, LDA yields latent topics that are hard to interpret. BERT accurately identified only about two thirds of new statements. Humans provided reliable and cost-effective coding in the web-based context. The upfront training enables BERT to process enormous quantities of text data in future work, which should examine NLP’s predictive accuracy given different quantities of training data. |
| Author | Stark, Roland B. Stuart, Richard B.B. Schwartz, Carolyn E. Biletch, Elijah |
| Author_xml | – sequence: 1 givenname: Carolyn E. surname: Schwartz fullname: Schwartz, Carolyn E. organization: DeltaQuest Foundation and Tufts University School of Medicine – sequence: 2 givenname: Roland B. surname: Stark fullname: Stark, Roland B. organization: DeltaQuest Foundation – sequence: 3 givenname: Elijah surname: Biletch fullname: Biletch, Elijah organization: DeltaQuest Foundation – sequence: 4 givenname: Richard B.B. surname: Stuart fullname: Stuart, Richard B.B. organization: DeltaQuest Foundation |
| BookMark | eNp9kU1P3DAQhq2KSlDgwi_wGbRgJ57EOaKlH0ggLvQczdrjbFaOHdmJUH4A_7u7bFVx6lxm3tGj5_J-YychBmLsSorbQoG-2w1DzrdQNvUXdlZIaFa1Bjj5dJ-yy5x3Yj8ATVE1Z-x9HYcRUx86vp0HDNxEewhT5NNb5AGnOaHnHkM3Y0d8TNFQzgcEfRdTP22HzPvAMY99wqmPIfPo-Ehx9MTROTITWb5Z-MNsthQC8ec5m9lj4g9LnlIct8sF--rQZ7r8u8_Z7x_fX9e_Vk8vPx_X908rI8u6XqnKAglAMBvlrBRKk5a6wrrQylItTalBKFAaStwAOEKpa-2ELMDqpsbynD0evTbirh1TP2Ba2oh9-_GIqWsxTb3x1JKqRCHRCglOWQMNGKsVSEUFVaDc3nVzdM1hxOUNvf8nlKI9FNJ-FNIeCtnT10fapJhzIvc_-A_yH5F6 |
| ContentType | Journal Article |
| CorporateAuthor | Memorial Sloan Kettering Cancer Center |
| CorporateAuthor_xml | – name: Memorial Sloan Kettering Cancer Center |
| DBID | AAYXX CITATION ADTOC UNPAY DOA |
| DOI | 10.2458/jmmss.5397 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Social Sciences (General) |
| EISSN | 2159-7855 |
| ExternalDocumentID | oai_doaj_org_article_e46021ad015f4dc595cd84514e2e654f 10.2458/jmmss.5397 10_2458_jmmss_5397 |
| GroupedDBID | 5VS AAYXX ADBBV AFMMW AGGFP ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ KQ8 M~E OK1 ADTOC IPNFZ RIG UNPAY |
| ID | FETCH-LOGICAL-c1377-46d5e05a5cb4fd1048e8186a7284de71c3850454853ab55fea1878f0125d897a3 |
| IEDL.DBID | DOA |
| ISSN | 2159-7855 |
| IngestDate | Fri Oct 03 12:52:26 EDT 2025 Tue Aug 19 22:25:45 EDT 2025 Tue Jul 01 01:41:33 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | https://creativecommons.org/licenses/by-nc-nd/4.0 cc-by-nc-nd |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1377-46d5e05a5cb4fd1048e8186a7284de71c3850454853ab55fea1878f0125d897a3 |
| OpenAccessLink | https://doaj.org/article/e46021ad015f4dc595cd84514e2e654f |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_e46021ad015f4dc595cd84514e2e654f unpaywall_primary_10_2458_jmmss_5397 crossref_primary_10_2458_jmmss_5397 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2022-10-1 2022-10-01 |
| PublicationDateYYYYMMDD | 2022-10-01 |
| PublicationDate_xml | – month: 10 year: 2022 text: 2022-10-1 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of methods and measurement in the social sciences |
| PublicationYear | 2022 |
| Publisher | University of Arizona Libraries |
| Publisher_xml | – name: University of Arizona Libraries |
| SSID | ssj0000559269 |
| Score | 2.1990378 |
| Snippet | Qualitative methods can enhance our understanding of constructs that have not been well portrayed and enable nuanced depiction of experience from study... |
| SourceID | doaj unpaywall crossref |
| SourceType | Open Website Open Access Repository Index Database |
| SubjectTerms | efficiency human natural language processing qualitative data |
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Nj9MwELWW7gFx4BtRBGgk9gCHNG3jSdzjsstqhcSKA5WWU2THdtmljaN8aNW9878Zx2kFQkIcuEbjWPFM_J6TmTeMHU1nxqoEVWQEVxG3qCORKh1xPUdhEy607LN8L9LzJf94iZcHbCeoMCzgvrqu2n-LmdDR8ZbIab8DX282TRMPaxuT15BgNdZeXt5JHVfaxnfYYYpEz0fscHnx-firbzJH0B1lAjHIlM45inCniR_-GzD1-v332N2urOT2Rq7Xv4DO2QNW70p3Qq7J90nXqklx-6eS4_97nofs_kBR4TjYPWIHpnzMxqGOF4a9oIG3g2D1uyfsx0loZliuoG_5B4XziAitg_bGQa8dSkN3n0ahCsUJ3kSuV66-ar9tGrgqQYbf_v5NAGchZLeD7FNOjAa1hdOOgoygAT51IYMWTrdNWzsKlqdsefbhy8l5NLR3iAovcxjxVKOZosRCcavpWCiMl9eTGSGmNtmsSAR6gUAiFFIhWiNnIhOWEBW1WGQyecZGpSvNcwba074sy2bWC69TUPI0mU9RL0QmbZEsxuzNzrd5FVQ8cjr9-AjI-7XP_YKP2Xvv9r2FV97uL7h6lQ_OyQ1PiRZJTTTKcl3gwqsrcJrfzE2K3I7Z0T5o_jLXi38ze8lGbd2ZV0R8WvV6COWfz4QQQg priority: 102 providerName: Unpaywall |
| Title | Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy |
| URI | http://journals.librarypublishing.arizona.edu/jmmss/article/id/5397/download/pdf/ https://doaj.org/article/e46021ad015f4dc595cd84514e2e654f |
| UnpaywallVersion | publishedVersion |
| Volume | 13 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2159-7855 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000559269 issn: 2159-7855 databaseCode: KQ8 dateStart: 20100101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2159-7855 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000559269 issn: 2159-7855 databaseCode: DOA dateStart: 20100101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2159-7855 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000559269 issn: 2159-7855 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09b9swECWCZGg6BGmbok6b4IB4aAbVtsSTqDFfhhEgRocaSCaBEkk3gS0ZlozAP6D_uzxSNjy1S1aBAgW-E99ReveOsW5_oE0eYR5owfOAG1SBiHMVcBWiMBEXSjqV7zgeTfj9Iz7utPoiTZi3B_YL19M8tjQklaUtw1WBKVWzc0vzOtQxckO7b1-kO4cp7-qNaRin3o805Ch6L_N5Xf_AiNyddhjIGfW_Z-9W5UKuX-VstsMuw2N21KaFcOUf5wPb0-VH1vG1s9C-fzV8b02iLz-xPze-gWA5BddmD4qKWAiaCprXCpxfp7118zkSFr4ggIbI2bRaPje_5zU8lyD9r3aKPqgMeEU5SCfz0AryNdyuLLB2O4aHlVetwu26bpaVBeiETYZ3v25GQdtSISjIWjDgsULdR4lFzo2yRzGhydJOJpallE4GRSSQTPksicsc0Wg5EIkwlsVQiTSR0We2X1al_sJAUaqVJMnAkNm5DQQeR2EfVSoSaYoo7bCLzTJnC--ckdkTB4GROTAyAqPDrgmB7Qhyu3YXbAxkbQxk_4uBDutu8fvHXKdvMddXdhhSBYTT831j-81ypc9sXtLk5y4Ez9nBZPzz6ukvn83ltw |
| linkProvider | Directory of Open Access Journals |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Nj9MwELWW7gFx4BtRBGgk9gCHNG3jSdzjsstqhcSKA5WWU2THdtmljaN8aNW9878Zx2kFQkIcuEbjWPFM_J6TmTeMHU1nxqoEVWQEVxG3qCORKh1xPUdhEy607LN8L9LzJf94iZcHbCeoMCzgvrqu2n-LmdDR8ZbIab8DX282TRMPaxuT15BgNdZeXt5JHVfaxnfYYYpEz0fscHnx-firbzJH0B1lAjHIlM45inCniR_-GzD1-v332N2urOT2Rq7Xv4DO2QNW70p3Qq7J90nXqklx-6eS4_97nofs_kBR4TjYPWIHpnzMxqGOF4a9oIG3g2D1uyfsx0loZliuoG_5B4XziAitg_bGQa8dSkN3n0ahCsUJ3kSuV66-ar9tGrgqQYbf_v5NAGchZLeD7FNOjAa1hdOOgoygAT51IYMWTrdNWzsKlqdsefbhy8l5NLR3iAovcxjxVKOZosRCcavpWCiMl9eTGSGmNtmsSAR6gUAiFFIhWiNnIhOWEBW1WGQyecZGpSvNcwba074sy2bWC69TUPI0mU9RL0QmbZEsxuzNzrd5FVQ8cjr9-AjI-7XP_YKP2Xvv9r2FV97uL7h6lQ_OyQ1PiRZJTTTKcl3gwqsrcJrfzE2K3I7Z0T5o_jLXi38ze8lGbd2ZV0R8WvV6COWfz4QQQg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparing+human+coding+to+two+natural+language+processing+algorithms+in+aspirations+of+people+affected+by+Duchenne+Muscular+Dystrophy&rft.jtitle=Journal+of+methods+and+measurement+in+the+social+sciences&rft.au=Carolyn+Emily+Schwartz&rft.date=2022-10-01&rft.pub=University+of+Arizona+Libraries&rft.eissn=2159-7855&rft.volume=13&rft.issue=1&rft_id=info:doi/10.2458%2Fjmmss.5397&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_e46021ad015f4dc595cd84514e2e654f |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2159-7855&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2159-7855&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2159-7855&client=summon |