Extracting Symptoms of Complex Conditions From Online Discourse (Subreddit to Symptomatology): Lexicon-Based Approach
Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in cli...
Saved in:
| Published in | JMIR medical informatics Vol. 13; p. e70940 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Canada
JMIR Publications
12.09.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2291-9694 2291-9694 |
| DOI | 10.2196/70940 |
Cover
| Abstract | Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse.
We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms.
We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list.
In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F
-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS.
The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research. |
|---|---|
| AbstractList | Background:Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse.Objective:We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms.Methods:We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT–based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list.Results:In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS.Conclusions:The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research. BackgroundMillions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse. ObjectiveWe aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms. MethodsWe propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT–based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list. ResultsIn a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS. ConclusionsThe comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research. Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse. We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms. We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list. In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F -scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS. The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research. Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse.BACKGROUNDMillions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse.We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms.OBJECTIVEWe aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms.We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list.METHODSWe propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list.In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS.RESULTSIn a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS.The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.CONCLUSIONSThe comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research. |
| Author | Preum, Sarah M Hossain, Bushra Ara, Rifat Ali, Mohammed Eunus Rabbi, Md Fazle |
| Author_xml | – sequence: 1 givenname: Bushra orcidid: 0009-0006-0941-0374 surname: Hossain fullname: Hossain, Bushra – sequence: 2 givenname: Sarah M orcidid: 0000-0002-7771-8323 surname: Preum fullname: Preum, Sarah M – sequence: 3 givenname: Md Fazle orcidid: 0009-0004-1628-4519 surname: Rabbi fullname: Rabbi, Md Fazle – sequence: 4 givenname: Rifat orcidid: 0009-0007-4191-5492 surname: Ara fullname: Ara, Rifat – sequence: 5 givenname: Mohammed Eunus orcidid: 0000-0002-0384-7616 surname: Ali fullname: Ali, Mohammed Eunus |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40939164$$D View this record in MEDLINE/PubMed |
| BookMark | eNp1kV9rFDEUxYNUbF33K0hAlPowbf5NJvGtXVstLPSh-jzcZDLrLDPJmMzQ3W9v2m2LCD7dcPlxcu45b9GRD94htKTkjFEtzyuiBXmFThjTtNBSi6O_3sdomdKWEEIFlVJWb9CxIJprKsUJmq92UwQ7dX6D7_bDOIUh4dDiVRjG3u3y9E03dcEnfB3DgG9933mHv3bJhjkmh0_vZhNdkyE8hWcJmEIfNvvPX_Da7TobfHEJyTX4YhxjAPvrHXrdQp_c8mku0M_rqx-r78X69tvN6mJdWC7VVLhWVdoI0JVkYMCVlaAcrOGlUVKKfJJRVClFrbKKMFtKRylw0uRdY1nJF-jmoNsE2NZj7AaI-zpAVz8uQtzUEKfO9q42kopSVwCiZIKX3FQNlZwzw4hqLYis9emgNfsR9vfQ9y-ClNQPNdSPNWTw9ADmW3_PLk31kNNyfQ_ehTnVnJWEUpndZ_TDP-g2p-pzJA8UE1rx7GGB3j9Rsxlc8_Ltc4sZ-HgAbAwpRdf-x9kfAzuoZQ |
| Cites_doi | 10.1093/jamia/ocae210 10.1145/3173574.3174139 10.48550/arXiv.2408.17181 10.1186/s12911-020-01352-2 10.48550/arXiv.2312.02296 10.5860/choice.189890 10.3115/992730.992783 10.2196/57406 10.1109/JBHI.2020.3001216 10.1093/humrep/dew174 10.2196/52499 10.2196/45767 10.1016/j.jbi.2025.104789 10.1101/2024.07.22.24310824 10.1093/acref/9780199976720.001.0001 10.1038/s41746-022-00589-7 10.18653/v1/2024.emnlp-main.673 10.7717/peerj-cs.1024 10.2196/29413 10.1016/j.ijmedinf.2024.105539 10.1016/j.appet.2016.11.010 10.18653/v1/2021.emnlp-main.846 10.4103/2230-8210.146860 10.2196/68863 10.1057/s41285-019-00106-z 10.1109/dsaa49011.2020.00096 10.1530/ec-21-0309 10.1137/1.9781611973440.96 10.1097/GRF.0000000000000563 10.1212/wnl.0000000000208104 10.1016/j.dss.2024.114172 10.1016/j.jpag.2023.10.007 10.2196/65631 10.1016/j.cmi.2023.11.002 10.1609/icwsm.v18i1.31391 10.18653/v1/N19-1423 10.18653/v1/2021.mrqa-1.15 10.1109/JBHI.2021.3123192 10.2196/48145 10.2196/49220 10.1609/icwsm.v17i1.22133 10.48550/arXiv.2205.12689 10.2196/45000 10.1093/bioinformatics/btz682 10.1038/nrendo.2018.24 10.1016/j.fertnstert.2022.09.100 10.2196/47826 10.2196/jmir.4721 10.2196/20509 10.1145/3411764.3445706 10.1109/issc52156.2021.9467856 10.1038/s41598-023-39986-7 10.1177/20563051211019004 10.1093/jamia/ocad259 |
| ContentType | Journal Article |
| Copyright | Bushra Hossain, Sarah M Preum, Md Fazle Rabbi, Rifat Ara, Mohammed Eunus Ali. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 12.09.2025. 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: Bushra Hossain, Sarah M Preum, Md Fazle Rabbi, Rifat Ara, Mohammed Eunus Ali. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 12.09.2025. – notice: 2025. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7X7 7XB 88C 8FI 8FJ 8FK ABUWG AFKRA AZQEC BENPR CCPQU COVID DWQXO FYUFA GHDGH K9. M0S M0T PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI 7X8 ADTOC UNPAY DOA |
| DOI | 10.2196/70940 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) ProQuest Health & Medical Collection ProQuest Central (purchase pre-March 2016) Healthcare Administration Database (Alumni) Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest One Community College Coronavirus Research Database ProQuest Central Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Health & Medical Complete (Alumni) Health & Medical Collection (Alumni Edition) Healthcare Administration Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition MEDLINE - Academic Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Central ProQuest Health & Medical Research Collection Health Research Premium Collection Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Health & Medical Research Collection ProQuest Central (New) ProQuest One Academic Eastern Edition ProQuest Health Management Coronavirus Research Database ProQuest Hospital Collection Health Research Premium Collection (Alumni) ProQuest Hospital Collection (Alumni) ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Health Management (Alumni Edition) ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
| DatabaseTitleList | Publicly Available Content Database MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 5 dbid: BENPR name: ProQuest Central url: http://www.proquest.com/pqcentral?accountid=15518 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Medicine Public Health |
| EISSN | 2291-9694 |
| ExternalDocumentID | oai_doaj_org_article_b614597aa4524353b7d16332b208fca4 10.2196/70940 40939164 10_2196_70940 |
| Genre | Journal Article |
| GroupedDBID | 53G 5VS 7X7 8FI 8FJ AAFWJ AAYXX ABUWG ADBBV AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS AOIJS BAWUL BCNDV BENPR CCPQU CITATION DIK EMOBN FYUFA GROUPED_DOAJ HMCUK HYE KQ8 M0T M~E OK1 PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PUEGO RPM UKHRP CGR CUY CVF ECM EIF NPM 3V. 7XB 8FK AZQEC COVID DWQXO K9. M48 PKEHL PQEST PQQKQ PQUKI 7X8 ADRAZ ADTOC UNPAY |
| ID | FETCH-LOGICAL-c368t-ef879b4a9762abae57413acb35b8664000b818881c8c802c56e11a30d888dc253 |
| IEDL.DBID | BENPR |
| ISSN | 2291-9694 |
| IngestDate | Fri Oct 03 12:46:15 EDT 2025 Sun Oct 26 04:03:15 EDT 2025 Sat Sep 13 16:50:58 EDT 2025 Tue Oct 07 07:28:33 EDT 2025 Wed Oct 01 06:56:56 EDT 2025 Wed Oct 01 05:24:01 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | natural language processing online discourse large language models symptomatology health information extraction complex medical condition disease-specific symptoms polycystic ovary syndrome |
| Language | English |
| License | Bushra Hossain, Sarah M Preum, Md Fazle Rabbi, Rifat Ara, Mohammed Eunus Ali. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 12.09.2025. cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c368t-ef879b4a9762abae57413acb35b8664000b818881c8c802c56e11a30d888dc253 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-7771-8323 0009-0007-4191-5492 0009-0004-1628-4519 0009-0006-0941-0374 0000-0002-0384-7616 |
| OpenAccessLink | https://www.proquest.com/docview/3252498363?pq-origsite=%requestingapplication%&accountid=15518 |
| PMID | 40939164 |
| PQID | 3252498363 |
| PQPubID | 4997117 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_b614597aa4524353b7d16332b208fca4 unpaywall_primary_10_2196_70940 proquest_miscellaneous_3250116000 proquest_journals_3252498363 pubmed_primary_40939164 crossref_primary_10_2196_70940 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2025-Sep-12 |
| PublicationDateYYYYMMDD | 2025-09-12 |
| PublicationDate_xml | – month: 09 year: 2025 text: 2025-Sep-12 day: 12 |
| PublicationDecade | 2020 |
| PublicationPlace | Canada |
| PublicationPlace_xml | – name: Canada – name: Toronto |
| PublicationTitle | JMIR medical informatics |
| PublicationTitleAlternate | JMIR Med Inform |
| PublicationYear | 2025 |
| Publisher | JMIR Publications |
| Publisher_xml | – name: JMIR Publications |
| References | ref13 ref57 ref12 ref56 ref15 ref59 ref14 ref58 ref53 ref52 ref55 ref10 ref54 ref17 ref16 ref19 Cruz, J (ref36) 2022; 118 ref18 ref51 ref50 ref46 ref45 ref48 ref47 ref42 ref41 ref44 ref43 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref34 ref37 ref31 ref30 ref33 ref32 ref2 ref1 ref39 ref38 ref71 ref70 Eftekhar, T (ref67) 2014; 12 ref24 ref68 ref23 ref26 ref25 ref69 ref20 ref64 ref63 ref22 ref66 ref21 ref65 ref28 ref27 ref29 Porta, M (ref11) 2014 ref60 ref62 ref61 |
| References_xml | – ident: ref17 doi: 10.1093/jamia/ocae210 – ident: ref62 – ident: ref9 doi: 10.1145/3173574.3174139 – ident: ref18 doi: 10.48550/arXiv.2408.17181 – ident: ref1 – ident: ref20 doi: 10.1186/s12911-020-01352-2 – ident: ref28 doi: 10.48550/arXiv.2312.02296 – ident: ref42 doi: 10.5860/choice.189890 – ident: ref64 doi: 10.3115/992730.992783 – ident: ref70 doi: 10.2196/57406 – ident: ref3 doi: 10.1109/JBHI.2020.3001216 – ident: ref33 doi: 10.1093/humrep/dew174 – ident: ref43 doi: 10.2196/52499 – ident: ref24 doi: 10.2196/45767 – ident: ref30 doi: 10.1016/j.jbi.2025.104789 – ident: ref57 – ident: ref40 doi: 10.1101/2024.07.22.24310824 – year: 2014 ident: ref11 publication-title: A Dictionary of Epidemiology doi: 10.1093/acref/9780199976720.001.0001 – ident: ref53 – ident: ref15 doi: 10.1038/s41746-022-00589-7 – ident: ref16 doi: 10.18653/v1/2024.emnlp-main.673 – ident: ref47 doi: 10.7717/peerj-cs.1024 – ident: ref61 – ident: ref2 – ident: ref25 doi: 10.2196/29413 – ident: ref46 doi: 10.1016/j.ijmedinf.2024.105539 – ident: ref69 doi: 10.1016/j.appet.2016.11.010 – ident: ref49 doi: 10.18653/v1/2021.emnlp-main.846 – ident: ref66 doi: 10.4103/2230-8210.146860 – ident: ref13 doi: 10.2196/68863 – volume: 12 start-page: 539 issue: 8 year: 2014 ident: ref67 publication-title: Iran J Reprod Med – ident: ref8 doi: 10.1057/s41285-019-00106-z – ident: ref51 doi: 10.1109/dsaa49011.2020.00096 – ident: ref68 doi: 10.1530/ec-21-0309 – ident: ref52 doi: 10.1137/1.9781611973440.96 – ident: ref10 doi: 10.1097/GRF.0000000000000563 – ident: ref65 doi: 10.1212/wnl.0000000000208104 – ident: ref26 doi: 10.1016/j.dss.2024.114172 – ident: ref71 – ident: ref35 doi: 10.1016/j.jpag.2023.10.007 – ident: ref44 doi: 10.2196/65631 – ident: ref39 – ident: ref58 – ident: ref29 doi: 10.1016/j.cmi.2023.11.002 – ident: ref12 doi: 10.1609/icwsm.v18i1.31391 – ident: ref48 doi: 10.18653/v1/N19-1423 – ident: ref54 doi: 10.18653/v1/2021.mrqa-1.15 – ident: ref45 – ident: ref23 doi: 10.1109/JBHI.2021.3123192 – ident: ref60 – ident: ref55 doi: 10.2196/48145 – ident: ref34 doi: 10.2196/49220 – ident: ref5 doi: 10.1609/icwsm.v17i1.22133 – ident: ref21 doi: 10.48550/arXiv.2205.12689 – ident: ref22 doi: 10.2196/45000 – ident: ref50 doi: 10.1093/bioinformatics/btz682 – ident: ref38 – ident: ref63 – ident: ref59 – ident: ref32 doi: 10.1038/nrendo.2018.24 – volume: 118 start-page: e323 issue: 4 year: 2022 ident: ref36 publication-title: Fertil Steril doi: 10.1016/j.fertnstert.2022.09.100 – ident: ref14 doi: 10.2196/47826 – ident: ref6 doi: 10.2196/jmir.4721 – ident: ref7 doi: 10.2196/20509 – ident: ref4 doi: 10.1145/3411764.3445706 – ident: ref19 doi: 10.1109/issc52156.2021.9467856 – ident: ref56 – ident: ref27 doi: 10.1038/s41598-023-39986-7 – ident: ref37 doi: 10.1177/20563051211019004 – ident: ref31 – ident: ref41 doi: 10.1093/jamia/ocad259 |
| SSID | ssj0001416667 |
| Score | 2.3055801 |
| Snippet | Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies... Background:Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While... BackgroundMillions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While... |
| SourceID | doaj unpaywall proquest pubmed crossref |
| SourceType | Open Website Open Access Repository Aggregation Database Index Database |
| StartPage | e70940 |
| SubjectTerms | Application programming interface Data collection Data Mining - methods Datasets Disease Female Humans Large language models Multimedia Natural Language Processing Patients Polycystic ovary syndrome Public health Social Media Social networks Subject specialists |
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3da9swED9KH9KNMrqvNmvaqbCH7cHUkWRL7lvbJYSx9qUN5M1IslwKiR0SmyX__U6yE1Io7GVPBkkI6X53vjuddAfwjVtKlUBPVYjMBOh_mUDakAU80bEQUY5utjvQv7uPR2P-axJNdkp9uTthTXrghnCXGvUHGr1K8YiiamdaZGhCMKppKHOjfCbQUCY7zpQ_XeEuHCY6cOjuOiOXXQqXKO6F8vE5-l8zLN_CQV3M1fqPmk53lM3wCN61ViK5blb3HvZs8QE6d20c_CPUg1XlnzcVT-RhPZtX5WxJypw46Z7aFX5dJNpxFBkuyhlpEoqSn89LU7pbG-Q7_jAWNsNBpCo3Uyhfy3b944r8tivkkCK4QR2Xkes27_gnGA8Hj7ejoC2gEBgWyyqwuRSJ5gpNDqq0shGaD0wZzSIt4xilN9Sor6XsG2lkSE0U235fsTDDtszQiH2G_aIs7AkQhE0rk0vGlOWIa5JJ3uc5ep-G2YyzLpxvKJvOmzwZKfoXjvSpJ30Xbhy9t50urbVvQLDTFuz0X2B3obdBK21lbZkyioMTyWJcxMW2G6XEhT5UYcvaj3ERJ9xxF44blLcrQQ_XvT7Gyb9uYX99D1_-xx5O4Q11NYR9GYoe7FeL2p6hYVPpc8_DfwGDKPMz priority: 102 providerName: Directory of Open Access Journals – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT9swED-xIjEmNLYBW_maJ-1hPAQS23Ec3spGhdBAk7ZK7CmyHQch2qRqE63dX79zklYwMe0pknOybN-d73c--w7gI7eUqgg91ShKjYf-l_Gk9ZnHYy2iKMzQzXYH-lfX4mLAL2_CmxU4WLyFeRC_R10SJ5HL7_YMVkWIULsDq4Prb72frmAcjQMvFjFfg41HtI_sS52G_yns-AKeV_lYzX-p4fCBPelvNkd_0zoNobtGcn9clfrY_P4rSeM_h_oKXrZIkvQa1r-GFZu_gbWrNla-BdX5rKyfQOW35Pt8NC6L0ZQUGXE7wNDO8Oui1U7qSH9SjEiTdJR8uZuawt3sIJ9wU5nYFIlIWSy6UHW92_nRKflqZyhFuXeGdjAlvTY3-TYM-uc_Pl94bZEFzzAhS89mMoo1VwhLqNLKhggxmDKahVoKgRrua7TpUgZGGulTEwobBIr5KbalhoZsBzp5kdt3QJC1WplMMqYsR97HqeQBz9BDNcymnHXhcMGaZNzk0kjQB3GLl9SL14Uzx7DlT5f6um7AZU5aTUo0Agr0gpTiIUWsx3SUIqZkVFNfZkbxLuwv2J20-jhNGEXiWDKBg_iw_I2a5MIjKrdFVdO4qBTOuAtvGzFZjgS9YPdCGTt_v5Sbp-ew-1-KPVinrohwXYdiHzrlpLIHiGxKfdjK9x89MPOC priority: 102 providerName: Unpaywall |
| Title | Extracting Symptoms of Complex Conditions From Online Discourse (Subreddit to Symptomatology): Lexicon-Based Approach |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/40939164 https://www.proquest.com/docview/3252498363 https://www.proquest.com/docview/3250116000 https://doi.org/10.2196/70940 https://doaj.org/article/b614597aa4524353b7d16332b208fca4 |
| UnpaywallVersion | publishedVersion |
| Volume | 13 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2291-9694 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001416667 issn: 2291-9694 databaseCode: KQ8 dateStart: 20130101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2291-9694 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001416667 issn: 2291-9694 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 2291-9694 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001416667 issn: 2291-9694 databaseCode: DIK dateStart: 20130101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2291-9694 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001416667 issn: 2291-9694 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 2291-9694 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001416667 issn: 2291-9694 databaseCode: RPM dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 2291-9694 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001416667 issn: 2291-9694 databaseCode: 7X7 dateStart: 20130101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 2291-9694 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001416667 issn: 2291-9694 databaseCode: BENPR dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR1db9Mw8LR10gBNCAaMwChG4gEeoqW2E7tICLXQakKsmoBK3VNkO86E1CZdmor233PORwfSxFMU23Ji34fvw3cH8JZbSpVATVWIxPiofxlf2oD5vK8jIcIU1Wxn0L-YROdT_nUWzvZg0sbCuGuVLU-sGHWSG2cjP2M0RE1Bsoh9Wt74rmqU8662JTRUU1oh-VilGNuHA-oyY3XgYDiaXH6_tbpw5yYTh3Dk7kAj9p0Jl0Dun0Opyt1_l8D5AO6ts6Xa_lbz-V-H0PgRPGykRzKowf0Y9mx2DIcXjX_8GI5qKxypg4uewHq0KaswqOya_NgulmW-WJE8JY4LzO0Gn85j7TCPjIt8QerEo-TLrxUutFhZ8g4ZS2ETHETKvJ1CVTVvt-8_kG92g5iU-UM8CxMyaPKTP4XpePTz87nfFFrwDYtk6dtUir7mCkUTqrSyIYoZTBnNQi2jCKk80HiuS9kz0siAmjCyvZ5iQYJtiaEhewadLM_scyAIXq1MKhlTliP8-4nkPZ6ilmqYTTjzoNvudLys82nEqIc4UMQVKDwYuv3fdbr011VDXlzHDTXFGoUK1ISU4ogULGRaJChXMqppIFOjuAenLfTihiZX8S0GefBm143U5FwkKrP5uhrjPFO4Yg9Oaqjv_gQ1YReljJO_3qHB3Wt48f_Pv4T71FURrgpRnEKnLNb2FYo2pe7CvpiJboO13cpAgG_TyeXg6g_-Dvrf |
| linkProvider | ProQuest |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1ta9swED5KC-1KGVv35q1rNdhg-2DqSLKtDMpo1oR0TcLYWug3V5LlUkjsLHZo8uf223byS7pB2bd-MtjGtnSP7u7R-e4A3nNDqQyRqYZhrF3kX9oVxmMub6sgDP0Eabbd0B-Ogv4F_3bpX67B7yYXxv5W2ejEUlHHmbZ75IeM-sgUBAvYl-kv13aNstHVpoWGrFsrxEdlibE6sePMLG-RwuVHpyco7w-U9rrnX_tu3WXA1SwQhWsSEbYVl2iXqVTS-GhjmdSK-UoEAULcU2jUhGhpoYVHtR-YVksyL8Zzsaa2awSagA3OeBvJ30anO_r-426Xh9uwXLgJO_afa0T7YWgL1v1jBMteAfc5uNuwNU-ncnkrx-O_jF7vCTyuvVVyXMHrKayZdBc2h3U8fhd2ql0_UiUzPYN5d1GUaVfpNfm5nEyLbJKTLCFW64zNAo82Qm6RTnqzbEKqQqfk5CbHiZ3lhnxERTYzMd5Eiqx5hCx77C4_fSYDs0Dkpm4HbW9Mjut66M_h4kGm_AWsp1lqXgFBOCmpE8GYNBzx1o4Fb_EEWbFmJubMgf1mpqNpVb8jQt5jRRGVonCgY-d_ddGW2y5PZLPrqF69kUInBpmXlBxByHymwhj9WEYV9USiJXdgr5FeVOuAPLpDrAPvVpdx9dqQjExNNi_vsZEwHLEDLyupr74EmbfNisaHH6xgcP8YXv__9Qew1T8fDqLB6ejsDTyitoNx2QRjD9aL2dy8RbeqUPs1dglcPfRy-QMVoTMM |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3raxNBEB9KhagU0fqK1nYFBf1wJNndu90IIq1paO0DQQv5du7u7RUhuYvJhSb_mn-dM_dIFYrf-ulg79i72_nNa2d2BuCN9JwbhZ6qUokL0P9ygfZdEci-jZQKU3SzaUP_7Dw6upBfRuFoA343Z2EorbKRiaWgTnJHe-QdwUP0FLSIRCet0yK-Doafpr8C6iBFkdamnUYFkRO_ukL3bf7xeIC0fsv58PD756Og7jAQOBHpIvCpVn0rDepkbqzxIepXYZwVodVRhPDuWlRoWvecdrrLXRj5Xs-IboJjiePUMQLF_x0lRJ_SCdVIXe_vSArIqRZsUbY14ryjqFTdP-qv7BJwk2l7H-4usqlZXZnx-C91N3wID2o7le1XwHoEGz7bhtZZHYnfhq1qv49Vx5gew-JwWZQHrrJL9m01mRb5ZM7ylJG8GfslXik2Thhnw1k-YVWJUzb4OXc55ZGwdyjCZj7Bh1iRN1OYsrvu6v0HduqXiNksOECtm7D9uhL6E7i4lQV_CptZnvnnwBBI1rhUC2G8RKT1Ey17MkV_2AmfSNGG3Wal42lVuSNGj4dIEZekaMMBrf_6JhXaLgfy2WVc821s0XxBn8sYifATobAqQQtWcMu7OnVGtmGnoV5cc_88vsZqG16vbyPfUjDGZD5flM9QDAz_uA3PKqqvvwR9bjoPjZPvrWFw8z-8-P_r96CFTBKfHp-fvIR7nFoXl90vdmCzmC38K7SnCrtbApfBj9vmlD-pjzCm |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3dT9swED-xIjEmNLYBW_maJ-1hPAQS23Ec3spGhdBAk7ZK7CmyHQch2qRqE63dX79zklYwMe0pknOybN-d73c--w7gI7eUqgg91ShKjYf-l_Gk9ZnHYy2iKMzQzXYH-lfX4mLAL2_CmxU4WLyFeRC_R10SJ5HL7_YMVkWIULsDq4Prb72frmAcjQMvFjFfg41HtI_sS52G_yns-AKeV_lYzX-p4fCBPelvNkd_0zoNobtGcn9clfrY_P4rSeM_h_oKXrZIkvQa1r-GFZu_gbWrNla-BdX5rKyfQOW35Pt8NC6L0ZQUGXE7wNDO8Oui1U7qSH9SjEiTdJR8uZuawt3sIJ9wU5nYFIlIWSy6UHW92_nRKflqZyhFuXeGdjAlvTY3-TYM-uc_Pl94bZEFzzAhS89mMoo1VwhLqNLKhggxmDKahVoKgRrua7TpUgZGGulTEwobBIr5KbalhoZsBzp5kdt3QJC1WplMMqYsR97HqeQBz9BDNcymnHXhcMGaZNzk0kjQB3GLl9SL14Uzx7DlT5f6um7AZU5aTUo0Agr0gpTiIUWsx3SUIqZkVFNfZkbxLuwv2J20-jhNGEXiWDKBg_iw_I2a5MIjKrdFVdO4qBTOuAtvGzFZjgS9YPdCGTt_v5Sbp-ew-1-KPVinrohwXYdiHzrlpLIHiGxKfdjK9x89MPOC |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+Symptoms+of+Complex+Conditions+From+Online+Discourse+%28Subreddit+to+Symptomatology%29%3A+Lexicon-Based+Approach&rft.jtitle=JMIR+medical+informatics&rft.au=Hossain%2C+Bushra&rft.au=Preum%2C+Sarah+M&rft.au=Rabbi%2C+Md+Fazle&rft.au=Ara%2C+Rifat&rft.date=2025-09-12&rft.issn=2291-9694&rft.eissn=2291-9694&rft.volume=13&rft.spage=e70940&rft_id=info:doi/10.2196%2F70940&rft.externalDBID=n%2Fa&rft.externalDocID=10_2196_70940 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2291-9694&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2291-9694&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2291-9694&client=summon |