Extracting Symptoms of Complex Conditions From Online Discourse (Subreddit to Symptomatology): Lexicon-Based Approach

Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in cli...

Full description

Saved in:
Bibliographic Details
Published inJMIR medical informatics Vol. 13; p. e70940
Main Authors Hossain, Bushra, Preum, Sarah M, Rabbi, Md Fazle, Ara, Rifat, Ali, Mohammed Eunus
Format Journal Article
LanguageEnglish
Published Canada JMIR Publications 12.09.2025
Subjects
Online AccessGet full text
ISSN2291-9694
2291-9694
DOI10.2196/70940

Cover

Abstract Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse. We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms. We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list. In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F -scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS. The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.
AbstractList Background:Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse.Objective:We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms.Methods:We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT–based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list.Results:In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS.Conclusions:The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.
BackgroundMillions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse. ObjectiveWe aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms. MethodsWe propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT–based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list. ResultsIn a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS. ConclusionsThe comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.
Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse. We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms. We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list. In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F -scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS. The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.
Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse.BACKGROUNDMillions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse.We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms.OBJECTIVEWe aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms.We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list.METHODSWe propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list.In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS.RESULTSIn a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS.The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.CONCLUSIONSThe comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.
Author Preum, Sarah M
Hossain, Bushra
Ara, Rifat
Ali, Mohammed Eunus
Rabbi, Md Fazle
Author_xml – sequence: 1
  givenname: Bushra
  orcidid: 0009-0006-0941-0374
  surname: Hossain
  fullname: Hossain, Bushra
– sequence: 2
  givenname: Sarah M
  orcidid: 0000-0002-7771-8323
  surname: Preum
  fullname: Preum, Sarah M
– sequence: 3
  givenname: Md Fazle
  orcidid: 0009-0004-1628-4519
  surname: Rabbi
  fullname: Rabbi, Md Fazle
– sequence: 4
  givenname: Rifat
  orcidid: 0009-0007-4191-5492
  surname: Ara
  fullname: Ara, Rifat
– sequence: 5
  givenname: Mohammed Eunus
  orcidid: 0000-0002-0384-7616
  surname: Ali
  fullname: Ali, Mohammed Eunus
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40939164$$D View this record in MEDLINE/PubMed
BookMark eNp1kV9rFDEUxYNUbF33K0hAlPowbf5NJvGtXVstLPSh-jzcZDLrLDPJmMzQ3W9v2m2LCD7dcPlxcu45b9GRD94htKTkjFEtzyuiBXmFThjTtNBSi6O_3sdomdKWEEIFlVJWb9CxIJprKsUJmq92UwQ7dX6D7_bDOIUh4dDiVRjG3u3y9E03dcEnfB3DgG9933mHv3bJhjkmh0_vZhNdkyE8hWcJmEIfNvvPX_Da7TobfHEJyTX4YhxjAPvrHXrdQp_c8mku0M_rqx-r78X69tvN6mJdWC7VVLhWVdoI0JVkYMCVlaAcrOGlUVKKfJJRVClFrbKKMFtKRylw0uRdY1nJF-jmoNsE2NZj7AaI-zpAVz8uQtzUEKfO9q42kopSVwCiZIKX3FQNlZwzw4hqLYis9emgNfsR9vfQ9y-ClNQPNdSPNWTw9ADmW3_PLk31kNNyfQ_ehTnVnJWEUpndZ_TDP-g2p-pzJA8UE1rx7GGB3j9Rsxlc8_Ltc4sZ-HgAbAwpRdf-x9kfAzuoZQ
Cites_doi 10.1093/jamia/ocae210
10.1145/3173574.3174139
10.48550/arXiv.2408.17181
10.1186/s12911-020-01352-2
10.48550/arXiv.2312.02296
10.5860/choice.189890
10.3115/992730.992783
10.2196/57406
10.1109/JBHI.2020.3001216
10.1093/humrep/dew174
10.2196/52499
10.2196/45767
10.1016/j.jbi.2025.104789
10.1101/2024.07.22.24310824
10.1093/acref/9780199976720.001.0001
10.1038/s41746-022-00589-7
10.18653/v1/2024.emnlp-main.673
10.7717/peerj-cs.1024
10.2196/29413
10.1016/j.ijmedinf.2024.105539
10.1016/j.appet.2016.11.010
10.18653/v1/2021.emnlp-main.846
10.4103/2230-8210.146860
10.2196/68863
10.1057/s41285-019-00106-z
10.1109/dsaa49011.2020.00096
10.1530/ec-21-0309
10.1137/1.9781611973440.96
10.1097/GRF.0000000000000563
10.1212/wnl.0000000000208104
10.1016/j.dss.2024.114172
10.1016/j.jpag.2023.10.007
10.2196/65631
10.1016/j.cmi.2023.11.002
10.1609/icwsm.v18i1.31391
10.18653/v1/N19-1423
10.18653/v1/2021.mrqa-1.15
10.1109/JBHI.2021.3123192
10.2196/48145
10.2196/49220
10.1609/icwsm.v17i1.22133
10.48550/arXiv.2205.12689
10.2196/45000
10.1093/bioinformatics/btz682
10.1038/nrendo.2018.24
10.1016/j.fertnstert.2022.09.100
10.2196/47826
10.2196/jmir.4721
10.2196/20509
10.1145/3411764.3445706
10.1109/issc52156.2021.9467856
10.1038/s41598-023-39986-7
10.1177/20563051211019004
10.1093/jamia/ocad259
ContentType Journal Article
Copyright Bushra Hossain, Sarah M Preum, Md Fazle Rabbi, Rifat Ara, Mohammed Eunus Ali. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 12.09.2025.
2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: Bushra Hossain, Sarah M Preum, Md Fazle Rabbi, Rifat Ara, Mohammed Eunus Ali. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 12.09.2025.
– notice: 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7X7
7XB
88C
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BENPR
CCPQU
COVID
DWQXO
FYUFA
GHDGH
K9.
M0S
M0T
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQQKQ
PQUKI
7X8
ADTOC
UNPAY
DOA
DOI 10.2196/70940
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
ProQuest Central (Corporate)
ProQuest Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Healthcare Administration Database (Alumni)
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
ProQuest One Community College
Coronavirus Research Database
ProQuest Central
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Health & Medical Complete (Alumni)
Health & Medical Collection (Alumni Edition)
Healthcare Administration Database
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
MEDLINE - Academic
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Central
ProQuest Health & Medical Research Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Health & Medical Research Collection
ProQuest Central (New)
ProQuest One Academic Eastern Edition
ProQuest Health Management
Coronavirus Research Database
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
ProQuest Health Management (Alumni Edition)
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList Publicly Available Content Database

MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 4
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 5
  dbid: BENPR
  name: ProQuest Central
  url: http://www.proquest.com/pqcentral?accountid=15518
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Public Health
EISSN 2291-9694
ExternalDocumentID oai_doaj_org_article_b614597aa4524353b7d16332b208fca4
10.2196/70940
40939164
10_2196_70940
Genre Journal Article
GroupedDBID 53G
5VS
7X7
8FI
8FJ
AAFWJ
AAYXX
ABUWG
ADBBV
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BAWUL
BCNDV
BENPR
CCPQU
CITATION
DIK
EMOBN
FYUFA
GROUPED_DOAJ
HMCUK
HYE
KQ8
M0T
M~E
OK1
PGMZT
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PUEGO
RPM
UKHRP
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7XB
8FK
AZQEC
COVID
DWQXO
K9.
M48
PKEHL
PQEST
PQQKQ
PQUKI
7X8
ADRAZ
ADTOC
UNPAY
ID FETCH-LOGICAL-c368t-ef879b4a9762abae57413acb35b8664000b818881c8c802c56e11a30d888dc253
IEDL.DBID BENPR
ISSN 2291-9694
IngestDate Fri Oct 03 12:46:15 EDT 2025
Sun Oct 26 04:03:15 EDT 2025
Sat Sep 13 16:50:58 EDT 2025
Tue Oct 07 07:28:33 EDT 2025
Wed Oct 01 06:56:56 EDT 2025
Wed Oct 01 05:24:01 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords natural language processing
online discourse
large language models
symptomatology
health information extraction
complex medical condition
disease-specific symptoms
polycystic ovary syndrome
Language English
License Bushra Hossain, Sarah M Preum, Md Fazle Rabbi, Rifat Ara, Mohammed Eunus Ali. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 12.09.2025.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c368t-ef879b4a9762abae57413acb35b8664000b818881c8c802c56e11a30d888dc253
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-7771-8323
0009-0007-4191-5492
0009-0004-1628-4519
0009-0006-0941-0374
0000-0002-0384-7616
OpenAccessLink https://www.proquest.com/docview/3252498363?pq-origsite=%requestingapplication%&accountid=15518
PMID 40939164
PQID 3252498363
PQPubID 4997117
ParticipantIDs doaj_primary_oai_doaj_org_article_b614597aa4524353b7d16332b208fca4
unpaywall_primary_10_2196_70940
proquest_miscellaneous_3250116000
proquest_journals_3252498363
pubmed_primary_40939164
crossref_primary_10_2196_70940
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-Sep-12
PublicationDateYYYYMMDD 2025-09-12
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-Sep-12
  day: 12
PublicationDecade 2020
PublicationPlace Canada
PublicationPlace_xml – name: Canada
– name: Toronto
PublicationTitle JMIR medical informatics
PublicationTitleAlternate JMIR Med Inform
PublicationYear 2025
Publisher JMIR Publications
Publisher_xml – name: JMIR Publications
References ref13
ref57
ref12
ref56
ref15
ref59
ref14
ref58
ref53
ref52
ref55
ref10
ref54
ref17
ref16
ref19
Cruz, J (ref36) 2022; 118
ref18
ref51
ref50
ref46
ref45
ref48
ref47
ref42
ref41
ref44
ref43
ref49
ref8
ref7
ref9
ref4
ref3
ref6
ref5
ref40
ref35
ref34
ref37
ref31
ref30
ref33
ref32
ref2
ref1
ref39
ref38
ref71
ref70
Eftekhar, T (ref67) 2014; 12
ref24
ref68
ref23
ref26
ref25
ref69
ref20
ref64
ref63
ref22
ref66
ref21
ref65
ref28
ref27
ref29
Porta, M (ref11) 2014
ref60
ref62
ref61
References_xml – ident: ref17
  doi: 10.1093/jamia/ocae210
– ident: ref62
– ident: ref9
  doi: 10.1145/3173574.3174139
– ident: ref18
  doi: 10.48550/arXiv.2408.17181
– ident: ref1
– ident: ref20
  doi: 10.1186/s12911-020-01352-2
– ident: ref28
  doi: 10.48550/arXiv.2312.02296
– ident: ref42
  doi: 10.5860/choice.189890
– ident: ref64
  doi: 10.3115/992730.992783
– ident: ref70
  doi: 10.2196/57406
– ident: ref3
  doi: 10.1109/JBHI.2020.3001216
– ident: ref33
  doi: 10.1093/humrep/dew174
– ident: ref43
  doi: 10.2196/52499
– ident: ref24
  doi: 10.2196/45767
– ident: ref30
  doi: 10.1016/j.jbi.2025.104789
– ident: ref57
– ident: ref40
  doi: 10.1101/2024.07.22.24310824
– year: 2014
  ident: ref11
  publication-title: A Dictionary of Epidemiology
  doi: 10.1093/acref/9780199976720.001.0001
– ident: ref53
– ident: ref15
  doi: 10.1038/s41746-022-00589-7
– ident: ref16
  doi: 10.18653/v1/2024.emnlp-main.673
– ident: ref47
  doi: 10.7717/peerj-cs.1024
– ident: ref61
– ident: ref2
– ident: ref25
  doi: 10.2196/29413
– ident: ref46
  doi: 10.1016/j.ijmedinf.2024.105539
– ident: ref69
  doi: 10.1016/j.appet.2016.11.010
– ident: ref49
  doi: 10.18653/v1/2021.emnlp-main.846
– ident: ref66
  doi: 10.4103/2230-8210.146860
– ident: ref13
  doi: 10.2196/68863
– volume: 12
  start-page: 539
  issue: 8
  year: 2014
  ident: ref67
  publication-title: Iran J Reprod Med
– ident: ref8
  doi: 10.1057/s41285-019-00106-z
– ident: ref51
  doi: 10.1109/dsaa49011.2020.00096
– ident: ref68
  doi: 10.1530/ec-21-0309
– ident: ref52
  doi: 10.1137/1.9781611973440.96
– ident: ref10
  doi: 10.1097/GRF.0000000000000563
– ident: ref65
  doi: 10.1212/wnl.0000000000208104
– ident: ref26
  doi: 10.1016/j.dss.2024.114172
– ident: ref71
– ident: ref35
  doi: 10.1016/j.jpag.2023.10.007
– ident: ref44
  doi: 10.2196/65631
– ident: ref39
– ident: ref58
– ident: ref29
  doi: 10.1016/j.cmi.2023.11.002
– ident: ref12
  doi: 10.1609/icwsm.v18i1.31391
– ident: ref48
  doi: 10.18653/v1/N19-1423
– ident: ref54
  doi: 10.18653/v1/2021.mrqa-1.15
– ident: ref45
– ident: ref23
  doi: 10.1109/JBHI.2021.3123192
– ident: ref60
– ident: ref55
  doi: 10.2196/48145
– ident: ref34
  doi: 10.2196/49220
– ident: ref5
  doi: 10.1609/icwsm.v17i1.22133
– ident: ref21
  doi: 10.48550/arXiv.2205.12689
– ident: ref22
  doi: 10.2196/45000
– ident: ref50
  doi: 10.1093/bioinformatics/btz682
– ident: ref38
– ident: ref63
– ident: ref59
– ident: ref32
  doi: 10.1038/nrendo.2018.24
– volume: 118
  start-page: e323
  issue: 4
  year: 2022
  ident: ref36
  publication-title: Fertil Steril
  doi: 10.1016/j.fertnstert.2022.09.100
– ident: ref14
  doi: 10.2196/47826
– ident: ref6
  doi: 10.2196/jmir.4721
– ident: ref7
  doi: 10.2196/20509
– ident: ref4
  doi: 10.1145/3411764.3445706
– ident: ref19
  doi: 10.1109/issc52156.2021.9467856
– ident: ref56
– ident: ref27
  doi: 10.1038/s41598-023-39986-7
– ident: ref37
  doi: 10.1177/20563051211019004
– ident: ref31
– ident: ref41
  doi: 10.1093/jamia/ocad259
SSID ssj0001416667
Score 2.3055801
Snippet Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies...
Background:Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While...
BackgroundMillions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While...
SourceID doaj
unpaywall
proquest
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
StartPage e70940
SubjectTerms Application programming interface
Data collection
Data Mining - methods
Datasets
Disease
Female
Humans
Large language models
Multimedia
Natural Language Processing
Patients
Polycystic ovary syndrome
Public health
Social Media
Social networks
Subject specialists
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3da9swED9KH9KNMrqvNmvaqbCH7cHUkWRL7lvbJYSx9qUN5M1IslwKiR0SmyX__U6yE1Io7GVPBkkI6X53vjuddAfwjVtKlUBPVYjMBOh_mUDakAU80bEQUY5utjvQv7uPR2P-axJNdkp9uTthTXrghnCXGvUHGr1K8YiiamdaZGhCMKppKHOjfCbQUCY7zpQ_XeEuHCY6cOjuOiOXXQqXKO6F8vE5-l8zLN_CQV3M1fqPmk53lM3wCN61ViK5blb3HvZs8QE6d20c_CPUg1XlnzcVT-RhPZtX5WxJypw46Z7aFX5dJNpxFBkuyhlpEoqSn89LU7pbG-Q7_jAWNsNBpCo3Uyhfy3b944r8tivkkCK4QR2Xkes27_gnGA8Hj7ejoC2gEBgWyyqwuRSJ5gpNDqq0shGaD0wZzSIt4xilN9Sor6XsG2lkSE0U235fsTDDtszQiH2G_aIs7AkQhE0rk0vGlOWIa5JJ3uc5ep-G2YyzLpxvKJvOmzwZKfoXjvSpJ30Xbhy9t50urbVvQLDTFuz0X2B3obdBK21lbZkyioMTyWJcxMW2G6XEhT5UYcvaj3ERJ9xxF44blLcrQQ_XvT7Gyb9uYX99D1_-xx5O4Q11NYR9GYoe7FeL2p6hYVPpc8_DfwGDKPMz
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT9swED-xIjEmNLYBW_maJ-1hPAQS23Ec3spGhdBAk7ZK7CmyHQch2qRqE63dX79zklYwMe0pknOybN-d73c--w7gI7eUqgg91ShKjYf-l_Gk9ZnHYy2iKMzQzXYH-lfX4mLAL2_CmxU4WLyFeRC_R10SJ5HL7_YMVkWIULsDq4Prb72frmAcjQMvFjFfg41HtI_sS52G_yns-AKeV_lYzX-p4fCBPelvNkd_0zoNobtGcn9clfrY_P4rSeM_h_oKXrZIkvQa1r-GFZu_gbWrNla-BdX5rKyfQOW35Pt8NC6L0ZQUGXE7wNDO8Oui1U7qSH9SjEiTdJR8uZuawt3sIJ9wU5nYFIlIWSy6UHW92_nRKflqZyhFuXeGdjAlvTY3-TYM-uc_Pl94bZEFzzAhS89mMoo1VwhLqNLKhggxmDKahVoKgRrua7TpUgZGGulTEwobBIr5KbalhoZsBzp5kdt3QJC1WplMMqYsR97HqeQBz9BDNcymnHXhcMGaZNzk0kjQB3GLl9SL14Uzx7DlT5f6um7AZU5aTUo0Agr0gpTiIUWsx3SUIqZkVFNfZkbxLuwv2J20-jhNGEXiWDKBg_iw_I2a5MIjKrdFVdO4qBTOuAtvGzFZjgS9YPdCGTt_v5Sbp-ew-1-KPVinrohwXYdiHzrlpLIHiGxKfdjK9x89MPOC
  priority: 102
  providerName: Unpaywall
Title Extracting Symptoms of Complex Conditions From Online Discourse (Subreddit to Symptomatology): Lexicon-Based Approach
URI https://www.ncbi.nlm.nih.gov/pubmed/40939164
https://www.proquest.com/docview/3252498363
https://www.proquest.com/docview/3250116000
https://doi.org/10.2196/70940
https://doaj.org/article/b614597aa4524353b7d16332b208fca4
UnpaywallVersion publishedVersion
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 2291-9694
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001416667
  issn: 2291-9694
  databaseCode: KQ8
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2291-9694
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001416667
  issn: 2291-9694
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 2291-9694
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001416667
  issn: 2291-9694
  databaseCode: DIK
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2291-9694
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001416667
  issn: 2291-9694
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 2291-9694
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001416667
  issn: 2291-9694
  databaseCode: RPM
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 2291-9694
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001416667
  issn: 2291-9694
  databaseCode: 7X7
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 2291-9694
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001416667
  issn: 2291-9694
  databaseCode: BENPR
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR1db9Mw8LR10gBNCAaMwChG4gEeoqW2E7tICLXQakKsmoBK3VNkO86E1CZdmor233PORwfSxFMU23Ji34fvw3cH8JZbSpVATVWIxPiofxlf2oD5vK8jIcIU1Wxn0L-YROdT_nUWzvZg0sbCuGuVLU-sGHWSG2cjP2M0RE1Bsoh9Wt74rmqU8662JTRUU1oh-VilGNuHA-oyY3XgYDiaXH6_tbpw5yYTh3Dk7kAj9p0Jl0Dun0Opyt1_l8D5AO6ts6Xa_lbz-V-H0PgRPGykRzKowf0Y9mx2DIcXjX_8GI5qKxypg4uewHq0KaswqOya_NgulmW-WJE8JY4LzO0Gn85j7TCPjIt8QerEo-TLrxUutFhZ8g4ZS2ETHETKvJ1CVTVvt-8_kG92g5iU-UM8CxMyaPKTP4XpePTz87nfFFrwDYtk6dtUir7mCkUTqrSyIYoZTBnNQi2jCKk80HiuS9kz0siAmjCyvZ5iQYJtiaEhewadLM_scyAIXq1MKhlTliP8-4nkPZ6ilmqYTTjzoNvudLys82nEqIc4UMQVKDwYuv3fdbr011VDXlzHDTXFGoUK1ISU4ogULGRaJChXMqppIFOjuAenLfTihiZX8S0GefBm143U5FwkKrP5uhrjPFO4Yg9Oaqjv_gQ1YReljJO_3qHB3Wt48f_Pv4T71FURrgpRnEKnLNb2FYo2pe7CvpiJboO13cpAgG_TyeXg6g_-Dvrf
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1ta9swED5KC-1KGVv35q1rNdhg-2DqSLKtDMpo1oR0TcLYWug3V5LlUkjsLHZo8uf223byS7pB2bd-MtjGtnSP7u7R-e4A3nNDqQyRqYZhrF3kX9oVxmMub6sgDP0Eabbd0B-Ogv4F_3bpX67B7yYXxv5W2ejEUlHHmbZ75IeM-sgUBAvYl-kv13aNstHVpoWGrFsrxEdlibE6sePMLG-RwuVHpyco7w-U9rrnX_tu3WXA1SwQhWsSEbYVl2iXqVTS-GhjmdSK-UoEAULcU2jUhGhpoYVHtR-YVksyL8Zzsaa2awSagA3OeBvJ30anO_r-426Xh9uwXLgJO_afa0T7YWgL1v1jBMteAfc5uNuwNU-ncnkrx-O_jF7vCTyuvVVyXMHrKayZdBc2h3U8fhd2ql0_UiUzPYN5d1GUaVfpNfm5nEyLbJKTLCFW64zNAo82Qm6RTnqzbEKqQqfk5CbHiZ3lhnxERTYzMd5Eiqx5hCx77C4_fSYDs0Dkpm4HbW9Mjut66M_h4kGm_AWsp1lqXgFBOCmpE8GYNBzx1o4Fb_EEWbFmJubMgf1mpqNpVb8jQt5jRRGVonCgY-d_ddGW2y5PZLPrqF69kUInBpmXlBxByHymwhj9WEYV9USiJXdgr5FeVOuAPLpDrAPvVpdx9dqQjExNNi_vsZEwHLEDLyupr74EmbfNisaHH6xgcP8YXv__9Qew1T8fDqLB6ejsDTyitoNx2QRjD9aL2dy8RbeqUPs1dglcPfRy-QMVoTMM
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3raxNBEB9KhagU0fqK1nYFBf1wJNndu90IIq1paO0DQQv5du7u7RUhuYvJhSb_mn-dM_dIFYrf-ulg79i72_nNa2d2BuCN9JwbhZ6qUokL0P9ygfZdEci-jZQKU3SzaUP_7Dw6upBfRuFoA343Z2EorbKRiaWgTnJHe-QdwUP0FLSIRCet0yK-Doafpr8C6iBFkdamnUYFkRO_ukL3bf7xeIC0fsv58PD756Og7jAQOBHpIvCpVn0rDepkbqzxIepXYZwVodVRhPDuWlRoWvecdrrLXRj5Xs-IboJjiePUMQLF_x0lRJ_SCdVIXe_vSArIqRZsUbY14ryjqFTdP-qv7BJwk2l7H-4usqlZXZnx-C91N3wID2o7le1XwHoEGz7bhtZZHYnfhq1qv49Vx5gew-JwWZQHrrJL9m01mRb5ZM7ylJG8GfslXik2Thhnw1k-YVWJUzb4OXc55ZGwdyjCZj7Bh1iRN1OYsrvu6v0HduqXiNksOECtm7D9uhL6E7i4lQV_CptZnvnnwBBI1rhUC2G8RKT1Ey17MkV_2AmfSNGG3Wal42lVuSNGj4dIEZekaMMBrf_6JhXaLgfy2WVc821s0XxBn8sYifATobAqQQtWcMu7OnVGtmGnoV5cc_88vsZqG16vbyPfUjDGZD5flM9QDAz_uA3PKqqvvwR9bjoPjZPvrWFw8z-8-P_r96CFTBKfHp-fvIR7nFoXl90vdmCzmC38K7SnCrtbApfBj9vmlD-pjzCm
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT9swED-xIjEmNLYBW_maJ-1hPAQS23Ec3spGhdBAk7ZK7CmyHQch2qRqE63dX79zklYwMe0pknOybN-d73c--w7gI7eUqgg91ShKjYf-l_Gk9ZnHYy2iKMzQzXYH-lfX4mLAL2_CmxU4WLyFeRC_R10SJ5HL7_YMVkWIULsDq4Prb72frmAcjQMvFjFfg41HtI_sS52G_yns-AKeV_lYzX-p4fCBPelvNkd_0zoNobtGcn9clfrY_P4rSeM_h_oKXrZIkvQa1r-GFZu_gbWrNla-BdX5rKyfQOW35Pt8NC6L0ZQUGXE7wNDO8Oui1U7qSH9SjEiTdJR8uZuawt3sIJ9wU5nYFIlIWSy6UHW92_nRKflqZyhFuXeGdjAlvTY3-TYM-uc_Pl94bZEFzzAhS89mMoo1VwhLqNLKhggxmDKahVoKgRrua7TpUgZGGulTEwobBIr5KbalhoZsBzp5kdt3QJC1WplMMqYsR97HqeQBz9BDNcymnHXhcMGaZNzk0kjQB3GLl9SL14Uzx7DlT5f6um7AZU5aTUo0Agr0gpTiIUWsx3SUIqZkVFNfZkbxLuwv2J20-jhNGEXiWDKBg_iw_I2a5MIjKrdFVdO4qBTOuAtvGzFZjgS9YPdCGTt_v5Sbp-ew-1-KPVinrohwXYdiHzrlpLIHiGxKfdjK9x89MPOC
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+Symptoms+of+Complex+Conditions+From+Online+Discourse+%28Subreddit+to+Symptomatology%29%3A+Lexicon-Based+Approach&rft.jtitle=JMIR+medical+informatics&rft.au=Hossain%2C+Bushra&rft.au=Preum%2C+Sarah+M&rft.au=Rabbi%2C+Md+Fazle&rft.au=Ara%2C+Rifat&rft.date=2025-09-12&rft.issn=2291-9694&rft.eissn=2291-9694&rft.volume=13&rft.spage=e70940&rft_id=info:doi/10.2196%2F70940&rft.externalDBID=n%2Fa&rft.externalDocID=10_2196_70940
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2291-9694&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2291-9694&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2291-9694&client=summon