Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing
Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural l...
Saved in:
| Published in | Applied sciences Vol. 11; no. 19; p. 8812 |
|---|---|
| Main Authors | , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Basel
MDPI AG
01.10.2021
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2076-3417 2076-3417 |
| DOI | 10.3390/app11198812 |
Cover
| Abstract | Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research. |
|---|---|
| AbstractList | Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research. Featured ApplicationThe study presents an improved and easily obtainable method in terms of automatic smoking classification from unstructured bilingual electronic health records.AbstractSmoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research. |
| Author | Lee, Hae-Young Ko, Taehoon Seo, Hee Hwa Jeon, Hyojin Bae, Ye Seul Kim, Han Kyul Kim, Kyung Hwan Choi, Sae Won |
| Author_xml | – sequence: 1 givenname: Ye Seul orcidid: 0000-0003-0763-5458 surname: Bae fullname: Bae, Ye Seul – sequence: 2 givenname: Kyung Hwan surname: Kim fullname: Kim, Kyung Hwan – sequence: 3 givenname: Han Kyul orcidid: 0000-0002-4854-7211 surname: Kim fullname: Kim, Han Kyul – sequence: 4 givenname: Sae Won orcidid: 0000-0002-0123-8227 surname: Choi fullname: Choi, Sae Won – sequence: 5 givenname: Taehoon surname: Ko fullname: Ko, Taehoon – sequence: 6 givenname: Hee Hwa orcidid: 0000-0002-6442-8220 surname: Seo fullname: Seo, Hee Hwa – sequence: 7 givenname: Hae-Young orcidid: 0000-0002-9521-4102 surname: Lee fullname: Lee, Hae-Young – sequence: 8 givenname: Hyojin surname: Jeon fullname: Jeon, Hyojin |
| BookMark | eNp9kd1u1DAQhSNUJErpFS9giUtY8E_ixJftaqEVK0BAr62JM069eOPFdlT2RXhevF2EKiTwzVjjM9-xfZ5WJ1OYsKqeM_paCEXfwG7HGFNdx_ij6pTTVi5EzdqTB_sn1XlKG1qWYqJj9LT6-R73dyEOZPUjRzDZhYlc-DFEl2-3xIZIlh5ScnbvppF82YZv9zVDnhOxMWzJzZRynE2eIw7k0vlyPoMnK48mxzA5Q64QfL4ln9EUo0QuIRVl8flQILFI13AYGZF8isFgMZvGZ9VjCz7h-e96Vt28XX1dXi3WH99dLy_WCyNknRdKNj1HJWvLkTVWDjWljINtWtpyaDkXQ4sSkUrKWNtThrWSRvZ26HsONYiz6vrIHQJs9C66LcS9DuD0fSPEUUPMznjUVKmutqzphWlqWvO-ER0WDykbVMweWK-OrHnawf4OvP8DZFQfItIPIiryF0f5LobvM6asN2GOU3mt5k1HlRRd1xYVO6pMDClFtNq48vklphKX8_8gv_xr5n_3-AVFKLJw |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2023_3245523 crossref_primary_10_1109_ACCESS_2024_3457850 crossref_primary_10_1109_ACCESS_2024_3467251 crossref_primary_10_2196_42477 crossref_primary_10_1109_ACCESS_2025_3538803 crossref_primary_10_1186_s12874_024_02231_4 |
| Cites_doi | 10.2196/23361 10.1197/jamia.M2408 10.1055/s-0039-1681088 10.1038/nrc2703 10.1109/BigData50022.2020.9378073 10.1142/S0218213004001466 10.1038/clpt.2012.54 10.1007/978-3-540-30586-6_74 10.1177/1460458218824742 10.1109/TKDE.2012.30 10.1093/jamia/ocz164 10.1016/j.jclinepi.2019.11.006 10.1161/HYPERTENSIONAHA.120.15026 10.1016/j.ijmedinf.2018.12.011 10.1177/1460458219882259 10.1197/jamia.M2442 10.1093/jamia/ocu041 10.1016/j.neucom.2015.09.096 10.1093/aje/kwf150 10.1016/1047-2797(93)90070-K 10.3390/app10217831 10.1145/312624.312649 10.1080/03009734.2020.1792010 10.5220/0010414508250832 10.3115/1219840.1219917 10.3115/v1/P14-1146 10.18653/v1/W15-3818 10.1056/NEJMsa1211128 10.1197/jamia.M3378 10.2337/dc21-S005 10.1109/EMBC.2014.6944182 10.1093/bioinformatics/btz682 10.5124/jkma.2012.55.8.711 10.1001/jama.284.6.735 10.1145/3459930.3469547 10.1136/bmj.h1551 10.1162/tacl_a_00134 10.1162/tacl_a_00106 10.1197/jamia.M2434 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 10.1016/j.neucom.2017.05.046 |
| ContentType | Journal Article |
| Copyright | 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION ABUWG AFKRA AZQEC BENPR CCPQU DWQXO PHGZM PHGZT PIMPY PKEHL PQEST PQQKQ PQUKI PRINS ADTOC UNPAY DOA |
| DOI | 10.3390/app11198812 |
| DatabaseName | CrossRef ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest One Community College ProQuest Central Korea ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Publicly Available Content Database CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals (WRLC) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 3 dbid: BENPR name: ProQuest Central url: http://www.proquest.com/pqcentral?accountid=15518 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Sciences (General) |
| EISSN | 2076-3417 |
| ExternalDocumentID | oai_doaj_org_article_09984f15b3c54042b538ea72665e91fa 10.3390/app11198812 10_3390_app11198812 |
| GroupedDBID | .4S 2XV 5VS 7XC 8CJ 8FE 8FG 8FH AADQD AAFWJ AAYXX ADBBV ADMLS AFKRA AFPKN AFZYC ALMA_UNASSIGNED_HOLDINGS APEBS ARCSS BCNDV BENPR CCPQU CITATION CZ9 D1I D1J D1K GROUPED_DOAJ IAO IGS ITC K6- K6V KC. KQ8 L6V LK5 LK8 M7R MODMG M~E OK1 P62 PHGZM PHGZT PIMPY PROAC TUS ABUWG AZQEC DWQXO PKEHL PQEST PQQKQ PQUKI PRINS ADTOC IPNFZ RIG UNPAY |
| ID | FETCH-LOGICAL-c364t-965b2e964f2e15f6d40012af57072a7223d7e6ee060117b01e496c6bfdbb2a4a3 |
| IEDL.DBID | BENPR |
| ISSN | 2076-3417 |
| IngestDate | Tue Oct 14 18:30:10 EDT 2025 Sun Oct 26 02:47:56 EDT 2025 Mon Jun 30 11:13:00 EDT 2025 Thu Apr 24 23:10:16 EDT 2025 Thu Oct 16 04:30:41 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 19 |
| Language | English |
| License | cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c364t-965b2e964f2e15f6d40012af57072a7223d7e6ee060117b01e496c6bfdbb2a4a3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-0763-5458 0000-0002-9521-4102 0000-0002-0123-8227 0000-0002-4854-7211 0000-0002-6442-8220 |
| OpenAccessLink | https://www.proquest.com/docview/2580963887?pq-origsite=%requestingapplication%&accountid=15518 |
| PQID | 2580963887 |
| PQPubID | 2032433 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_09984f15b3c54042b538ea72665e91fa unpaywall_primary_10_3390_app11198812 proquest_journals_2580963887 crossref_citationtrail_10_3390_app11198812 crossref_primary_10_3390_app11198812 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-10-01 |
| PublicationDateYYYYMMDD | 2021-10-01 |
| PublicationDate_xml | – month: 10 year: 2021 text: 2021-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Basel |
| PublicationPlace_xml | – name: Basel |
| PublicationTitle | Applied sciences |
| PublicationYear | 2021 |
| Publisher | MDPI AG |
| Publisher_xml | – name: MDPI AG |
| References | Bouma (ref_18) 2009; 30 Levy (ref_22) 2015; 3 Dalianis (ref_50) 2018; 9 ref_13 Clark (ref_30) 2008; 15 ref_10 Wang (ref_24) 2016; 174 ref_51 ref_19 ref_16 Jha (ref_4) 2009; 9 Jha (ref_3) 2013; 368 Patel (ref_35) 2018; 57 Cohen (ref_29) 2008; 15 Caccamisi (ref_36) 2020; 125 Xu (ref_47) 2010; 17 ref_23 Nikfarjam (ref_27) 2015; 22 Deerwester (ref_37) 1990; 41 Unger (ref_52) 2020; 75 Arora (ref_21) 2016; 4 ref_26 Leslie (ref_14) 2020; 22 Haerian (ref_48) 2012; 92 Freund (ref_2) 1993; 3 Groenhof (ref_32) 2020; 118 Han (ref_20) 2012; 25 Lee (ref_44) 2020; 36 Uzuner (ref_28) 2008; 15 ref_34 ref_33 Mons (ref_6) 2015; 350 Blei (ref_39) 2003; 3 Radford (ref_43) 2019; 1 Yao (ref_46) 2019; 26 Church (ref_17) 1990; 16 ref_38 Shoenbill (ref_11) 2020; 26 Park (ref_49) 2012; 55 Kim (ref_25) 2017; 266 ref_45 Levy (ref_15) 2014; 27 ref_42 ref_41 Golden (ref_31) 2020; 26 ref_9 ref_8 Cornet (ref_12) 2019; 123 Godtfredsen (ref_5) 2002; 156 ref_7 Baker (ref_1) 2000; 284 Matsuo (ref_40) 2004; 13 |
| References_xml | – volume: 22 start-page: e23361 year: 2020 ident: ref_14 article-title: openEHR archetype use and reuse within multilingual clinical data sets: Case study publication-title: J. Med. Internet Res. doi: 10.2196/23361 – volume: 15 start-page: 14 year: 2008 ident: ref_28 article-title: Identifying patient smoking status from medical discharge records publication-title: J. Am. Med. Inform. Assoc. doi: 10.1197/jamia.M2408 – volume: 57 start-page: 253 year: 2018 ident: ref_35 article-title: Leveraging electronic dental record data to classify patients based on their smoking intensity publication-title: Methods Inf. Med. doi: 10.1055/s-0039-1681088 – volume: 3 start-page: 993 year: 2003 ident: ref_39 article-title: Latent dirichlet allocation publication-title: J. Mach. Learn. Res. – volume: 9 start-page: 655 year: 2009 ident: ref_4 article-title: Avoidable global cancer deaths and total deaths from smoking publication-title: Nat. Rev. Cancer doi: 10.1038/nrc2703 – ident: ref_16 – ident: ref_45 doi: 10.1109/BigData50022.2020.9378073 – volume: 13 start-page: 157 year: 2004 ident: ref_40 article-title: Keyword extraction from a single document using word co-occurrence statistical information publication-title: Int. J. Artif. Intell. Tools doi: 10.1142/S0218213004001466 – volume: 16 start-page: 22 year: 1990 ident: ref_17 article-title: Word association norms, mutual information, and lexicography publication-title: Comput. Linguist. – ident: ref_42 – volume: 92 start-page: 228 year: 2012 ident: ref_48 article-title: Detection of pharmacovigilance-related adverse events using electronic health records and automated methods publication-title: Clin. Pharmacol. Ther. doi: 10.1038/clpt.2012.54 – ident: ref_23 – ident: ref_41 doi: 10.1007/978-3-540-30586-6_74 – volume: 26 start-page: 388 year: 2020 ident: ref_11 article-title: Natural language processing of lifestyle modification documentation publication-title: Health Inform. J. doi: 10.1177/1460458218824742 – volume: 25 start-page: 1307 year: 2012 ident: ref_20 article-title: Improving word similarity by augmenting PMI with estimates of word polysemy publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2012.30 – volume: 26 start-page: 1632 year: 2019 ident: ref_46 article-title: Traditional Chinese medicine clinical records classification with BERT and domain specific corpora publication-title: J. Am. Med. Inform. Assoc. doi: 10.1093/jamia/ocz164 – volume: 118 start-page: 100 year: 2020 ident: ref_32 article-title: Data mining information from electronic health records produced high yield and accuracy for current smoking status publication-title: J. Clin. Epidemiol. doi: 10.1016/j.jclinepi.2019.11.006 – volume: 75 start-page: 1334 year: 2020 ident: ref_52 article-title: 2020 International Society of Hypertension global hypertension practice guidelines publication-title: Hypertension doi: 10.1161/HYPERTENSIONAHA.120.15026 – volume: 123 start-page: 37 year: 2019 ident: ref_12 article-title: Quantitative analysis of manual annotation of clinical text samples publication-title: Int. J. Med. Inform. doi: 10.1016/j.ijmedinf.2018.12.011 – volume: 26 start-page: 1507 year: 2020 ident: ref_31 article-title: Validity of Veterans Health Administration structured data to determine accurate smoking status publication-title: Health Inform. J. doi: 10.1177/1460458219882259 – volume: 15 start-page: 36 year: 2008 ident: ref_30 article-title: Identifying smokers with a medical extraction system publication-title: J. Am. Med. Inform. Assoc. doi: 10.1197/jamia.M2442 – ident: ref_13 – volume: 22 start-page: 671 year: 2015 ident: ref_27 article-title: Pharmacovigilance from social media: Mining adverse drug reaction mentions using sequence labeling with word embedding cluster features publication-title: J. Am. Med. Inform. Assoc. doi: 10.1093/jamia/ocu041 – volume: 174 start-page: 806 year: 2016 ident: ref_24 article-title: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification publication-title: Neurocomputing doi: 10.1016/j.neucom.2015.09.096 – volume: 156 start-page: 994 year: 2002 ident: ref_5 article-title: Smoking reduction, smoking cessation, and mortality: A 16-year follow-up of 19,732 men and women from The Copenhagen Centre for Prospective Population Studies publication-title: Am. J. Epidemiol. doi: 10.1093/aje/kwf150 – volume: 3 start-page: 417 year: 1993 ident: ref_2 article-title: The health risks of smoking the framingham study: 34 years of follow-up publication-title: Ann. Epidemiol. doi: 10.1016/1047-2797(93)90070-K – ident: ref_8 doi: 10.3390/app10217831 – volume: 27 start-page: 2177 year: 2014 ident: ref_15 article-title: Neural word embedding as implicit matrix factorization publication-title: Adv. Neural Inf. Process. Syst. – ident: ref_38 doi: 10.1145/312624.312649 – volume: 125 start-page: 316 year: 2020 ident: ref_36 article-title: Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records publication-title: Upsala J. Med Sci. doi: 10.1080/03009734.2020.1792010 – volume: 9 start-page: 1 year: 2018 ident: ref_50 article-title: Clinical natural language processing in languages other than english: Opportunities and challenges publication-title: J. Biomed. Semant. – ident: ref_9 doi: 10.5220/0010414508250832 – ident: ref_19 doi: 10.3115/1219840.1219917 – ident: ref_26 doi: 10.3115/v1/P14-1146 – ident: ref_7 doi: 10.18653/v1/W15-3818 – volume: 368 start-page: 341 year: 2013 ident: ref_3 article-title: 21st-century hazards of smoking and benefits of cessation in the United States publication-title: N. Engl. J. Med. doi: 10.1056/NEJMsa1211128 – volume: 17 start-page: 19 year: 2010 ident: ref_47 article-title: MedEx: A medication information extraction system for clinical narratives publication-title: J. Am. Med. Inform. Assoc. doi: 10.1197/jamia.M3378 – ident: ref_51 doi: 10.2337/dc21-S005 – ident: ref_34 doi: 10.1109/EMBC.2014.6944182 – ident: ref_33 – volume: 36 start-page: 1234 year: 2020 ident: ref_44 article-title: BioBERT: A pre-trained biomedical language representation model for biomedical text mining publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz682 – volume: 1 start-page: 9 year: 2019 ident: ref_43 article-title: Language models are unsupervised multitask learners publication-title: OpenAI Blog – volume: 30 start-page: 31 year: 2009 ident: ref_18 article-title: Normalized (pointwise) mutual information in collocation extraction publication-title: Proc. GSCL – volume: 55 start-page: 711 year: 2012 ident: ref_49 article-title: A clinical research strategy using longitudinal observational data in the post-electronic health records era publication-title: J. Korean Med. Assoc. doi: 10.5124/jkma.2012.55.8.711 – volume: 284 start-page: 735 year: 2000 ident: ref_1 article-title: Health risks associated with cigar smoking publication-title: Jama doi: 10.1001/jama.284.6.735 – ident: ref_10 doi: 10.1145/3459930.3469547 – volume: 350 start-page: h1551 year: 2015 ident: ref_6 article-title: Impact of smoking and smoking cessation on cardiovascular events and mortality among older adults: Meta-analysis of individual participant data from prospective cohort studies of the CHANCES consortium publication-title: BMJ doi: 10.1136/bmj.h1551 – volume: 3 start-page: 211 year: 2015 ident: ref_22 article-title: Improving distributional similarity with lessons learned from word embeddings publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00134 – volume: 4 start-page: 385 year: 2016 ident: ref_21 article-title: A latent variable model approach to pmi-based word embeddings publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00106 – volume: 15 start-page: 32 year: 2008 ident: ref_29 article-title: Five-way smoking status classification using text hot-spot identification and error-correcting output codes publication-title: J. Am. Med. Inform. Assoc. doi: 10.1197/jamia.M2434 – volume: 41 start-page: 391 year: 1990 ident: ref_37 article-title: Indexing by latent semantic analysis publication-title: J. Am. Soc. Inf. Sci. doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 – volume: 266 start-page: 336 year: 2017 ident: ref_25 article-title: Bag-of-concepts: Comprehending document representation through clustering words in distributed representation publication-title: Neurocomputing doi: 10.1016/j.neucom.2017.05.046 |
| SSID | ssj0000913810 |
| Score | 2.3070583 |
| Snippet | Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured... Featured ApplicationThe study presents an improved and easily obtainable method in terms of automatic smoking classification from unstructured bilingual... |
| SourceID | doaj unpaywall proquest crossref |
| SourceType | Open Website Open Access Repository Aggregation Database Enrichment Source Index Database |
| StartPage | 8812 |
| SubjectTerms | Algorithms Bilingualism Cardiovascular disease Datasets document classification Electronic health records Hospitals Keywords lifestyle modification Medical records Medical research Natural language processing Patients Performance evaluation smoking |
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3fS9xAEF7EF_WhVGvptbbMg4VaCM3-zj56ciJqfWkPfAu7yW4rPe_Eu8P6j_Tv7exmPSJI--JTICzJsDOz800y8w0h-0a3NGhjipbyphAmqMIyIYumFNoHbpo2FdF8vVAnY3F6KS97o75iTVhHD9xt3BdEMJUIVDreILgQzKGHeqsxrkhvaEjQqKxML5lKZ7Chkbqqa8jjmNfH_8Ho1qaqKHsUghJT_yN4ubGc3tj7OzuZ9CLN8UvyIkNEOOxE2yZrfrpDtnrEgTtkO7vkHD5l3uiDV-TPmb-_w1wSRr8Xt12_AhxOfsww_f95DQhOIU3AvEqdTfDtevYrXRFtLucQ20xgnOlkl7e-heFV7FRfoiSj1agc6LqWoEta5zDEGNgCvufCJv4OOM-fPyE3IOATdsn4ePT96KTIYxeKhiuxKIySjnmjRGCeyqBaETGRDVKXmuHOM95qr7yPTC5Uu5J6YVSjXGidY1ZY_pqsT2dT_4YAN9QFXnpMM61QKurNIj7zkTYvNFoOyOcHTdRN5iSPozEmNeYmUW11T20Dsr9afNNRcTy9bBhVuloS-bPTDbSqOltV_T-rGpC9B4Oos1PPayarMp5XlR6Qjysj-Zcsb59Dlndkk8VCmlRBuEfW0RD8e0RCC_chGf1f7d8GQA priority: 102 providerName: Directory of Open Access Journals – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bb9MwFLZQ9wA8MDZAlI3JD0MCpKzxvX6a2qnTxKVCgkrjKbITe1Tr2qpNGOOH8Ht3nLhVhxBC4ilSZCeOfC7fcc75DkKHWhXEK62TgrA84drLxFAukjzlynmm86JOovk4lGcj_u5cnG9U8Ye0SgjFx7WRphBkJ2BmVYeQDtGdLnijzrzwx9_jWRKACSqZlIHEdEsKQOMttDUafup9DT3lVrObsjwG0X34KwzKrcOD7jiimq__Dsi8X03n5ubaTCYb_uZ0G5nVSps0k8ujqrRH-c_fSBz_51Meo0cRjOJeIz076J6b7qKHGxSFu2gnKv8Sv44M1W-eoF_v3c01RK148KNcNJURuDe5mC3G5bcrDDAY1702x3UNFf58Nbusr4BrqyUOBS14FIlrq4UrcH8cauIrWMlg3ZQHN_VRuAmPl7gP3rbA8J6hqZlC8Id40IpjqQM84SkanQ6-nJwlscFDkjPJy0RLYanTknvqiPCy4AF9GS9UqqhRgFwK5aRzgTOGKJsSx7XMpfWFtdRww56h1nQ2dc8RZppYz1IHAa3hUjqYbQAJukDQ53Ml2ujtarezPLKfhyYckwyioCAa2YZotNHhevC8If3487B-EJv1kMDUXd-YLS6yqPgZIPAu90RYlgM45tSChwmrk1I4Tbxpo_2V0GXRfCwzKrppsIxd1Uav1oL4t7W8-Mdxe-gBDVk5dTriPmrBXruXAKtKexA15xYzIR0f priority: 102 providerName: Unpaywall |
| Title | Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing |
| URI | https://www.proquest.com/docview/2580963887 https://www.mdpi.com/2076-3417/11/19/8812/pdf?version=1632636605 https://doaj.org/article/09984f15b3c54042b538ea72665e91fa |
| UnpaywallVersion | publishedVersion |
| Volume | 11 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2076-3417 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000913810 issn: 2076-3417 databaseCode: KQ8 dateStart: 20110101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals (WRLC) customDbUrl: eissn: 2076-3417 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000913810 issn: 2076-3417 databaseCode: DOA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 2076-3417 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913810 issn: 2076-3417 databaseCode: ADMLS dateStart: 20120901 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources (ISSN International Center) customDbUrl: eissn: 2076-3417 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000913810 issn: 2076-3417 databaseCode: M~E dateStart: 20110101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 2076-3417 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000913810 issn: 2076-3417 databaseCode: BENPR dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 2076-3417 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000913810 issn: 2076-3417 databaseCode: 8FG dateStart: 20110101 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3db9MwED9t3QPwgNgArTAqPwwJkCISx7HjB4Ra1DLxUU1ApfEUObE9Jrq29ENj_wh_L2fHCZ2E9hQlchxHd_b9zr77HcCxFDqxQspIJ2kVMWl5pCjLoipmwthUVtoH0Xwe85MJ-3CWne3AuMmFcWGVzZroF2o9r9we-Wua5bFTlly8XfyKXNUod7ralNBQobSCfuMpxnZhjzpmrA7sDYbj0y_trotjwcyTuE7US9Hfd-fEON1lnif0hmnyDP43YOedzWyhrq_UdLplgUYP4H6AjqRfy3ofdszsAO5tEQoewH6YqivyIvBJv3wIfz6a6yv0Mcnw93pZ5zGQ_vQcf27945IgaCW-MuaFz3giXy_nP_0VUehmRVz6CZkEmtnN0mgyuHAZ7BscybAtoUPqbCZSO7MrMkDbqAl-Z6w8rwf5FLZFSUhMwB4ewWQ0_PbuJArlGKIq5WwdSZ6V1EjOLDVJZrlmDispm4lYUCUQZ2hhuDGO4SURZZwYJnnFS6vLkiqm0sfQmc1n5hBIKpPSprFB91Mxzg2-rRC3GUenZyuRdeFVI4miClzlrmTGtECfxYmt2BJbF47bxouaouP_zQZOpG0Tx6vtH8yX50WYpgXi5ZzZJCvTCqEsoyXaAzc6zjMjE6u6cNQoRBEm-6r4p5pdeN4qyW1jeXJ7N0_hLnWhMz5m8Ag6KGLzDLHPuuzBbj563wtq3fM7CHg3GZ_2v_8Fk30I4g |
| linkProvider | ProQuest |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Bb9MwFLbGdhgcEBsgug3wYZMAKSKxHTs-TGiFTh3rKgSrtFvmxPaY6NrStBr7I_wcfhvPjlM6Ce22U6TKday8l_e-5_j7HkK7UujECikjndAyYtLySBGWRmXMhLFUltofojnp8-6AfT5Lz1bQn4YL445VNjHRB2o9Lt0e-XuSZrFzlkx8mPyMXNco93W1aaGhQmsFve8lxgKx49jcXEMJV-0ffQJ77xFy2Dn92I1Cl4GopJzNIsnTghjJmSUmSS3XzEEAZVMRC6IEpE8tDDfGCZckoogTwyQveWF1URDFFIV5H6A1RpmE4m-t3el_-brY5XGqm1kS18RASmXsvktDeJFZlpBbqdB3DLgFc9fno4m6uVbD4VLGO3yCHgeoig9q39pAK2a0iR4tCRhuoo0QGir8JuhXv32KfocHgju_ZtOaN4EPhhfwMGffrzCAZOw7cV56hhX-djX-4a-AeucVdnQXPAiytvOp0bh96Rjzc1hJZ9GyB9fsKVwXzxVuQy7WGO7TV15HBPfCNiwORAiY4Rka3IthnqPV0XhkXiBMZVJYGhsodxXj3MC_FeBE4-T7bCnSFnrXWCIvgza6a9ExzKFGcmbLl8zWQruLwZNaEuT_w9rOpIshTsfb_zCeXuQhLOSAzzNmk7SgJUBnRgrIP251nKdGJla10E7jEHkILlX-71Voob2Fk9y1lq27p3mN1runJ728d9Q_3kYPiTu2488r7qBVMLd5CbhrVrwKzo3R-X2_T38BtsZBjQ |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1fb9MwED-NIfHnAbEBomOAHzYJkKLFjmPHDwitrGVjo0KCSnvLnMQe07q2NK3Gvggfhk_HOXFCJ6G97SlS5DiW73x3tu_3O4AtJQtqpVJBQaM84MqKQDMeB3nIpbGRyosqiebLQOwP-efj-HgF_jRYGJdW2djEylAXk9ydke-wOAmdsiRyx_q0iK97_Q_Tn4GrIOVuWptyGrWKHJqrS9y-le8P9lDW24z1e98_7ge-wkCQR4LPAyXijBkluGWGxlYU3Ll_bWMZSqYlus5CGmGMIy2hMgup4UrkIrNFljHNdYT93oG70rG4O5R6_1N7vuP4NhMa1pDAKFKhu5FGw6KShLJrTrCqFXAtwL2_GE_11aUejZZ8Xf8xPPJBKtmttWoNVsx4HR4uUReuw5o3CiV545mr3z6B3346SO_XfFYjJsju6BSnbv7jgmB4TKoanGcVtop8u5icV0-MdxclcUAXMvSEtouZKUj3zGHlFziSXlush9S4KVJvm0vSRS9cEPzPQFcMIuTIH8ASD4HAHp7C8FbE8gxWx5OxeQ4kUjSzUWhwo6u5EAa_1hghGkfcZ3MZd-BdI4k096zorjjHKMXdkRNbuiS2Dmy1jac1Gcj_m3WdSNsmjsG7ejGZnabeIKQYmSfc0jiLcgyaOcvQ87jRCREbRa3uwGajEKk3K2X6bxF0YLtVkpvGsnFzN6_hHq6i9OhgcPgCHjCXr1MlKm7CKkrbvMSAa569qjSbwMltL6W_8iU_Jw |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bb9MwFLZQ9wA8MDZAlI3JD0MCpKzxvX6a2qnTxKVCgkrjKbITe1Tr2qpNGOOH8Ht3nLhVhxBC4ilSZCeOfC7fcc75DkKHWhXEK62TgrA84drLxFAukjzlynmm86JOovk4lGcj_u5cnG9U8Ye0SgjFx7WRphBkJ2BmVYeQDtGdLnijzrzwx9_jWRKACSqZlIHEdEsKQOMttDUafup9DT3lVrObsjwG0X34KwzKrcOD7jiimq__Dsi8X03n5ubaTCYb_uZ0G5nVSps0k8ujqrRH-c_fSBz_51Meo0cRjOJeIz076J6b7qKHGxSFu2gnKv8Sv44M1W-eoF_v3c01RK148KNcNJURuDe5mC3G5bcrDDAY1702x3UNFf58Nbusr4BrqyUOBS14FIlrq4UrcH8cauIrWMlg3ZQHN_VRuAmPl7gP3rbA8J6hqZlC8Id40IpjqQM84SkanQ6-nJwlscFDkjPJy0RLYanTknvqiPCy4AF9GS9UqqhRgFwK5aRzgTOGKJsSx7XMpfWFtdRww56h1nQ2dc8RZppYz1IHAa3hUjqYbQAJukDQ53Ml2ujtarezPLKfhyYckwyioCAa2YZotNHhevC8If3487B-EJv1kMDUXd-YLS6yqPgZIPAu90RYlgM45tSChwmrk1I4Tbxpo_2V0GXRfCwzKrppsIxd1Uav1oL4t7W8-Mdxe-gBDVk5dTriPmrBXruXAKtKexA15xYzIR0f |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Keyword+Extraction+Algorithm+for+Classifying+Smoking+Status+from+Unstructured+Bilingual+Electronic+Health+Records+Based+on+Natural+Language+Processing&rft.jtitle=Applied+sciences&rft.au=Bae%2C+Ye+Seul&rft.au=Kim%2C+Kyung+Hwan&rft.au=Kim%2C+Han+Kyul&rft.au=Choi%2C+Sae+Won&rft.date=2021-10-01&rft.issn=2076-3417&rft.eissn=2076-3417&rft.volume=11&rft.issue=19&rft.spage=8812&rft_id=info:doi/10.3390%2Fapp11198812&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_app11198812 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2076-3417&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2076-3417&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2076-3417&client=summon |