Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes

Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objectiv...

Full description

Saved in:

Bibliographic Details
Published in	Journal of medical Internet research Vol. 19; no. 11; p. e380
Main Authors	Lin, Chin, Hsu, Chia-Jung, Lou, Yu-Sheng, Yeh, Shih-Jen, Lee, Chia-Cheng, Su, Sui-Lung, Chen, Hsiang-Cheng
Format	Journal Article
Language	English
Published	Canada Gunther Eysenbach MD MPH, Associate Professor 06.11.2017 JMIR Publications
Subjects	Algorithms Artificial intelligence Automation Big Data Circulatory system Classification Clinical information Codes Dictionaries Disease Forests Health care Health insurance Health surveillance Hospitalized Hospitals Intelligence Machine learning Medical diagnosis Medical records Neural networks Original Paper Pipelines Public health Semantics Simulation Streptococcus infections Surveillance Tumors Validity Taiwan electronic medical records neural networks (computer) natural language processing word embedding convolutional neural network data mining text mining electronic health records machine learning
Online Access	Get full text
ISSN	1438-8871 1439-4456 1438-8871
DOI	10.2196/jmir.8344

Cover

Abstract	Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.
AbstractList	Background: Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Objective: Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. Methods: We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. Results: In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Conclusions: Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data. Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN).BACKGROUNDAutomated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN).Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes.OBJECTIVEOur objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes.We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness.METHODSWe used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness.In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes.RESULTSIn 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes.Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.CONCLUSIONSWord embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data. Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.
Author	Lou, Yu-Sheng Lee, Chia-Cheng Yeh, Shih-Jen Chen, Hsiang-Cheng Hsu, Chia-Jung Lin, Chin Su, Sui-Lung
AuthorAffiliation	2 Department of Research and Development National Defense Medical Center Taipei Taiwan 1 School of Public Health National Defense Medical Center Taipei Taiwan 4 Da-Yeh University Changhua Taiwan 3 Planning and Management Office Tri-Service General Hospital National Defense Medical Center Taipei Taiwan 5 Division of Rheumatology/Immunology/Allergy, Department of Internal Medicine Tri-Service General Hospital National Defense Medical Center Taipei Taiwan
AuthorAffiliation_xml	– name: 4 Da-Yeh University Changhua Taiwan – name: 1 School of Public Health National Defense Medical Center Taipei Taiwan – name: 3 Planning and Management Office Tri-Service General Hospital National Defense Medical Center Taipei Taiwan – name: 2 Department of Research and Development National Defense Medical Center Taipei Taiwan – name: 5 Division of Rheumatology/Immunology/Allergy, Department of Internal Medicine Tri-Service General Hospital National Defense Medical Center Taipei Taiwan
Author_xml	– sequence: 1 givenname: Chin orcidid: 0000-0003-2337-2096 surname: Lin fullname: Lin, Chin – sequence: 2 givenname: Chia-Jung orcidid: 0000-0001-9969-4855 surname: Hsu fullname: Hsu, Chia-Jung – sequence: 3 givenname: Yu-Sheng orcidid: 0000-0001-9115-2656 surname: Lou fullname: Lou, Yu-Sheng – sequence: 4 givenname: Shih-Jen orcidid: 0000-0001-5393-3996 surname: Yeh fullname: Yeh, Shih-Jen – sequence: 5 givenname: Chia-Cheng orcidid: 0000-0002-7450-504X surname: Lee fullname: Lee, Chia-Cheng – sequence: 6 givenname: Sui-Lung orcidid: 0000-0003-3122-1116 surname: Su fullname: Su, Sui-Lung – sequence: 7 givenname: Hsiang-Cheng orcidid: 0000-0002-0753-6161 surname: Chen fullname: Chen, Hsiang-Cheng
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/29109070$$D View this record in MEDLINE/PubMed
BookMark	eNp9kVtv1DAQhS1URC_wwB9AkXihSNvaiZPYL0jVUqDSCiQuz9bEO069SuzFdgr77-toC5QK8WTL55szM8fH5MB5h4Q8Z_SsZLI534w2nImK80fkiPFKLIRo2cG9-yE5jnFDaUm5ZE_IYSkZlbSlRyRdhGSN1RaG4solHAbbo9NYrBCCs64vvuAILlkdixsLxeXPhMFl-DNGPwWNsTA-FMsBYrRmNxe8tdA7H20sln6ddevyU9TXEHosPvqE8Sl5bGCI-OzuPCHf3l1-XX5YrD69v1perBaa0yotGOPSUNPyptMVIOO6gRYrqEBy7OraNCCpYGspuG5FB0Z0okWR5bXmEk11Ql7vfSe3hd0PGAa1DXaEsFOMqjk6NUen5ugy_GYPb6duxLVGlwL8KfBg1d-Ks9eq9zeqbmRTC5YNXt0ZBP99wpjUmNfOiYJDP0WV27Gmalsx93r5AN3kMHOsUZU1K1tZN-Vs-OL-RL9H-fV7GTjdAzr4GAOa_653_oDVNkGyfl7GDv-ouAWkeL3R
CitedBy_id	crossref_primary_10_1080_01605682_2018_1506559 crossref_primary_10_3390_ijerph18073839 crossref_primary_10_3390_jpm11080725 crossref_primary_10_2196_11461 crossref_primary_10_4018_IJSWIS_331033 crossref_primary_10_1093_jamia_ocab084 crossref_primary_10_2196_11966 crossref_primary_10_2196_14499 crossref_primary_10_2196_14971 crossref_primary_10_1186_s12911_020_1085_4 crossref_primary_10_2196_33799 crossref_primary_10_3390_healthcare9101298 crossref_primary_10_1016_j_imu_2023_101227 crossref_primary_10_2196_24594 crossref_primary_10_1097_MOG_0000000000000926 crossref_primary_10_1007_s00234_020_02420_0 crossref_primary_10_1016_j_cie_2022_108363 crossref_primary_10_1155_2022_6207054 crossref_primary_10_3390_jcm10010003 crossref_primary_10_1186_s12913_024_11761_y crossref_primary_10_3390_computers10020024 crossref_primary_10_2196_10788 crossref_primary_10_1016_j_ijmedinf_2022_104714 crossref_primary_10_1016_j_eswa_2022_118997 crossref_primary_10_1016_j_ijmedinf_2023_105122 crossref_primary_10_2196_40534
Cites_doi	10.1186/s12911-015-0174-2 10.1155/2016/8313454 10.1136/amiajnl-2012-001409 10.1214/aos/1013203451 10.1016/j.amepre.2011.08.015 10.18637/jss.v025.i05 10.1007/s00180-008-0119-7 10.1109/5.726791 10.3115/v1/d14-1162 10.24095/hpcdp.35.4.02 10.32614/RJ-2011-014 10.3115/v1/P14-1062 10.1016/j.ijmedinf.2015.08.004 10.1197/jamia.M1345 10.1371/journal.pone.0170242 10.1145/2567948.2577348 10.4066/AMJ.2013.1654 10.3389/fnbot.2013.00021 10.33321/cdi.2008.32.42 10.1017/S0950268806007011 10.1109/4235.585893 10.1016/j.chb.2016.05.051 10.1016/j.ijmedinf.2014.06.009
ContentType	Journal Article
Copyright	2017. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017. Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017. 2017
Copyright_xml	– notice: 2017. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017. – notice: Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017. 2017
DBID	AAYXX CITATION NPM 3V. 7QJ 7RV 7X7 7XB 8FI 8FJ 8FK ABUWG AFKRA ALSLI AZQEC BENPR CCPQU CNYFK DWQXO E3H F2A FYUFA GHDGH K9. KB0 M0S M1O NAPCQ PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI PRQQA 7X8 5PM ADTOC UNPAY
DOI	10.2196/jmir.8344
DatabaseName	CrossRef PubMed ProQuest Central (Corporate) Applied Social Sciences Index & Abstracts (ASSIA) Nursing & Allied Health Database Health & Medical Collection ProQuest Central (purchase pre-March 2016) Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Social Science Premium Collection ProQuest Central Essentials ProQuest Central (NIESG) ProQuest One Library & information science collection. ProQuest Central Library & Information Sciences Abstracts (LISA) Library & Information Science Abstracts (LISA) Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Health & Medical Complete (Alumni) Nursing & Allied Health Database (Alumni Edition) Health & Medical Collection (Alumni Edition) Library Science Database Nursing & Allied Health Premium ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest One Social Sciences MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef PubMed Publicly Available Content Database ProQuest One Academic Middle East (New) Library and Information Science Abstracts (LISA) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing Applied Social Sciences Index and Abstracts (ASSIA) ProQuest Central ProQuest Library Science ProQuest Health & Medical Research Collection Health Research Premium Collection Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Library & Information Science Collection ProQuest Central (New) Social Science Premium Collection ProQuest One Social Sciences ProQuest One Academic Eastern Edition ProQuest Nursing & Allied Health Source ProQuest Hospital Collection Health Research Premium Collection (Alumni) ProQuest Hospital Collection (Alumni) Nursing & Allied Health Premium ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Nursing & Allied Health Source (Alumni) ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic
DatabaseTitleList	Publicly Available Content Database MEDLINE - Academic PubMed
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 3 dbid: BENPR name: ProQuest Central url: http://www.proquest.com/pqcentral?accountid=15518 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Library & Information Science Public Health
EISSN	1438-8871
ExternalDocumentID	10.2196/jmir.8344 PMC5696581 29109070 10_2196_jmir_8344
Genre	Journal Article
GeographicLocations	Taiwan
GeographicLocations_xml	– name: Taiwan
GroupedDBID	--- .4I .DC 29L 2WC 36B 53G 5GY 5VS 77I 77K 7RV 7X7 8FI 8FJ AAFWJ AAKPC AAWTL AAYXX ABDBF ABIVO ABUWG ACGFO ADBBV ADRAZ AEGXH AENEX AFKRA AFPKN AIAGR ALMA_UNASSIGNED_HOLDINGS ALSLI AOIJS BAWUL BCNDV BENPR CCPQU CITATION CNYFK CS3 DIK DU5 DWQXO E3Z EAP EBD EBS EJD ELW EMB EMOBN ESX F5P FRP FYUFA GROUPED_DOAJ GX1 HMCUK HYE IAO ICO IEA IHR INH ISN ITC KQ8 M1O M48 NAPCQ OK1 OVT P2P PGMZT PHGZM PHGZT PIMPY PPXIY PQQKQ PRQQA PUEGO RNS RPM SJN SV3 TR2 UKHRP XSB ALIPV NPM 3V. 7QJ 7XB 8FK AZQEC E3H F2A K9. PJZUB PKEHL PQEST PQUKI 7X8 5PM ADTOC C1A O5R O5S UNPAY WOQ
ID	FETCH-LOGICAL-c403t-1149f0f746bc3ae14c6a7e3a3a94eb55f6a9081d984c78baf8b87e83a9dc49ef3
IEDL.DBID	M48
ISSN	1438-8871 1439-4456
IngestDate	Sun Oct 26 03:36:34 EDT 2025 Tue Sep 30 16:56:05 EDT 2025 Thu Sep 04 17:19:01 EDT 2025 Tue Oct 07 06:30:12 EDT 2025 Wed Feb 19 02:43:16 EST 2025 Wed Oct 01 06:00:07 EDT 2025 Thu Apr 24 22:59:34 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	11
Keywords	electronic medical records neural networks (computer) natural language processing word embedding convolutional neural network data mining text mining electronic health records machine learning
Language	English
License	This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included. cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c403t-1149f0f746bc3ae14c6a7e3a3a94eb55f6a9081d984c78baf8b87e83a9dc49ef3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0001-9115-2656 0000-0001-5393-3996 0000-0001-9969-4855 0000-0002-7450-504X 0000-0003-2337-2096 0000-0003-3122-1116 0000-0002-0753-6161
OpenAccessLink	http://journals.scholarsportal.info/openUrl.xqy?doi=10.2196/jmir.8344
PMID	29109070
PQID	2512795621
PQPubID	2033121
ParticipantIDs	unpaywall_primary_10_2196_jmir_8344 pubmedcentral_primary_oai_pubmedcentral_nih_gov_5696581 proquest_miscellaneous_1961637784 proquest_journals_2512795621 pubmed_primary_29109070 crossref_primary_10_2196_jmir_8344 crossref_citationtrail_10_2196_jmir_8344
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20171106
PublicationDateYYYYMMDD	2017-11-06
PublicationDate_xml	– month: 11 year: 2017 text: 20171106 day: 6
PublicationDecade	2010
PublicationPlace	Canada
PublicationPlace_xml	– name: Canada – name: Toronto – name: Toronto, Canada
PublicationTitle	Journal of medical Internet research
PublicationTitleAlternate	J Med Internet Res
PublicationYear	2017
Publisher	Gunther Eysenbach MD MPH, Associate Professor JMIR Publications
Publisher_xml	– name: Gunther Eysenbach MD MPH, Associate Professor – name: JMIR Publications
References	ref13 ref35 ref12 ref34 ref15 Muscatello, DJ (ref33) 2008; 32 ref37 ref14 ref31 ref30 ref11 ref10 ref32 ref2 ref1 ref17 ref19 ref18 Srivastava, N (ref29) 2014 Mitra, D (ref36) 2015; 35 Friedman, J (ref25) 2001; 29 Hornik, K (ref16) 2010; 3 ref24 ref23 ref26 ref20 ref22 ref21 Koopman, B (ref4) 2015; 2015 ref28 Bengio, Y (ref8) 2003; 3 ref27 ref7 ref9 ref3 ref6 ref5
References_xml	– ident: ref37 – ident: ref3 doi: 10.1186/s12911-015-0174-2 – ident: ref5 doi: 10.1155/2016/8313454 – ident: ref31 doi: 10.1136/amiajnl-2012-001409 – volume: 29 start-page: 1189 issue: 5 year: 2001 ident: ref25 publication-title: Ann Stat doi: 10.1214/aos/1013203451 – ident: ref1 doi: 10.1016/j.amepre.2011.08.015 – ident: ref19 doi: 10.18637/jss.v025.i05 – ident: ref18 doi: 10.1007/s00180-008-0119-7 – ident: ref12 doi: 10.1109/5.726791 – ident: ref11 doi: 10.3115/v1/d14-1162 – volume: 35 start-page: 73 issue: 4 year: 2015 ident: ref36 publication-title: Health Promot Chronic Dis Prev Can doi: 10.24095/hpcdp.35.4.02 – start-page: 1929 year: 2014 ident: ref29 publication-title: J Mach Learn Res – volume: 3 start-page: 22 issue: 2 year: 2010 ident: ref16 publication-title: The R Journal doi: 10.32614/RJ-2011-014 – ident: ref27 doi: 10.3115/v1/P14-1062 – ident: ref9 – ident: ref2 doi: 10.1016/j.ijmedinf.2015.08.004 – ident: ref32 – volume: 3 start-page: 1137 year: 2003 ident: ref8 publication-title: J Mach Learn Res – ident: ref15 doi: 10.3115/v1/d14-1162 – ident: ref17 – ident: ref35 doi: 10.1197/jamia.M1345 – ident: ref6 doi: 10.1371/journal.pone.0170242 – ident: ref14 doi: 10.1145/2567948.2577348 – ident: ref30 – ident: ref13 – ident: ref20 doi: 10.4066/AMJ.2013.1654 – ident: ref24 doi: 10.3389/fnbot.2013.00021 – ident: ref28 – ident: ref21 – volume: 32 start-page: 435 issue: 4 year: 2008 ident: ref33 publication-title: Commun Dis Intell Q Rep doi: 10.33321/cdi.2008.32.42 – ident: ref34 doi: 10.1017/S0950268806007011 – ident: ref23 – ident: ref26 doi: 10.1109/4235.585893 – ident: ref10 – ident: ref22 doi: 10.1016/j.chb.2016.05.051 – volume: 2015 start-page: 775 year: 2015 ident: ref4 publication-title: AMIA Annu Symp Proc – ident: ref7 doi: 10.1016/j.ijmedinf.2014.06.009
SSID	ssj0020491
Score	2.3843849
Snippet	Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language... Background: Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural...
SourceID	unpaywall pubmedcentral proquest pubmed crossref
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	e380
SubjectTerms	Algorithms Artificial intelligence Automation Big Data Circulatory system Classification Clinical information Codes Dictionaries Disease Forests Health care Health insurance Health surveillance Hospitalized Hospitals Intelligence Machine learning Medical diagnosis Medical records Neural networks Original Paper Pipelines Public health Semantics Simulation Streptococcus infections Surveillance Tumors Validity
SummonAdditionalLinks	– databaseName: Library Science Database dbid: M1O link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwEB5BkQAJ8VheCy0yD6Fess0mjp2cECqtCtKWA1TqLfIrENgmS5MFwa9nJvGGLgUOXGPLdjRj-xvPzDcAz4RzukDjOBAOTRQechvoSHNy8lqrklRHihKFZ4fi4Ii_OU6O_YNb48MqV2did1Db2tAb-Q7dwxLBfDR9sfgSUNUo8q76EhoX4RI5H6mCwWz6djC4EP1OezYh3Jdi59NJeTqhuhLrd9A5YHk-PvLKslqo79_UfH7m8tm_Aflq2X3MyefJstUT8-M3Rsf__6-bcN3jUvayV6RbcMFVI9jyWQ3sOfNpSyRG5s-DEVyeec_8CK7173-sT2u6DS2N1LNTsNdnaD-ZJ3T9wN65E5RqaRr2tVRsz7NRs5U_oWE4IetqdpZdLhZ71UcFlg3brS22lxV-ajqqJ8cOa8TMd-Bof-_97kHgKzwEhodxG6AxlhVhIbnQJlZuyo1Q0sUqVhl3OkkKoTLELDZLuZGpVkWqU-lSbLaGZ66I78JGVVfuPrBYpklmk8iGzvDCiczKsEB46ay1xqRmDNsrmefG059TFY55jmYQqUdO6pGTeozhydB10XN-_KnT5kqeud_2Tf5LmGN4PDTjhiUvjKpcvWxyHAUxsJQpDnGv17NhliijOFkZjkGuaeDQgcjA11uq8mNHCp4IovHBeZ8Ouvr3xT_49-IfwtWI0Au9notN2GhPl24LsVerH3Ub7CfgYDeO priority: 102 providerName: ProQuest – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VrQS98ChQlrbIPIS4pJuHYzvHqu2qIHVVCVYqp8ivQGCbrZosCH59x4k36rYguEXxxHYyM8o3mpnPAG-YtarA4DhgFkMUGlITqFhRl-Q1RqZCxdI1Cp9M2PGUfjhLz3wRTe3LKr-dl5dtEj925M5RNLKJCEenh-M7sM5ShNwDWJ9OTvc_t51D6LDoKFF3nQUUEUHHIIS-yEbtXO4sidX_zi0websm8t6iupC_fsrZ7NoPZ_wAxsutdnUm3_cWjdrTv2-wOP7zXR7CfQ85yX5nI49gzVabsOsbFshb4juSnIaId_VNuHvik-6PoXFPdkQT5P01Bk_iuVm_kI_2HBVU6pr8KCU58sTSZJkaqAkuQNrjN8u2rYocdgV-ZU0O5gbHywpv1S1rkyWTOcLfJzAdH306OA78YQ2BpmHSBBhXZUVYcMqUTqSNqGaS20QmMqNWpWnBZIbww2SCai6ULIQS3AocNppmtkiewqCaV_YZkISLNDNpbEKraWFZZnhYIFK0xhithR7Cu6Uqc-2ZzN2BGrMcIxqn9dx99dxpfQivetGLjr7jT0I7S3vIvQfXucN9HIPHOBrCy34Yfc8lVGRl54s6x1kQznIucIqtznz6VeLMlbzycAh8xbB6AcfrvTpSlV9bfu-UOUYeXPd1b4J_3_zz_5Lahg1ngG0TJduBQXO5sLuIphr1wvvSFSoPIeQ priority: 102 providerName: Unpaywall
Title	Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes
URI	https://www.ncbi.nlm.nih.gov/pubmed/29109070 https://www.proquest.com/docview/2512795621 https://www.proquest.com/docview/1961637784 https://pubmed.ncbi.nlm.nih.gov/PMC5696581 https://www.jmir.org/2017/11/e380/PDF
UnpaywallVersion	publishedVersion
Volume	19
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: KQ8 dateStart: 19990101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: DOA dateStart: 19990101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: ABDBF dateStart: 20050101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: DIK dateStart: 19990101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: GX1 dateStart: 19990101 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: RPM dateStart: 19990101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: Library Science Database customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: M1O dateStart: 20010101 isFulltext: true titleUrlDefault: https://search.proquest.com/libraryscience providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: BENPR dateStart: 20010101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Health & Medical Collection customDbUrl: eissn: 1438-8871 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: 7X7 dateStart: 20010101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 1438-8871 dateEnd: 20250131 omitProxy: true ssIdentifier: ssj0020491 issn: 1439-4456 databaseCode: M48 dateStart: 20100201 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED_tQwJeJja-yrbKfGjiJSNNnDh5QGhsnQZSywRU6p4ix3YgqEu7pgX233OXONGqDYmXPOQcO8rdyb_LnX8H8Do0Js0wOHZCgyEKd7l2Ui_llOTVWgZR6kk6KDwYhmcj_mkcjNegKWu2H7C8M7SjflKj-eTwz9X1e3T4d1TGjAb09udlPj-kfhEHsyuH-klR3tU211iHTdyzYmrqMOBtfsFDXFyFYhzdHd2sV3MOrc62ulPdgp-3qyjvL4uZvP4tJ5MbW9TpQ9iy2JId1cawDWum2IF9ezKBHTB79IhUwaxP78C9gc2uP4IFPVkzSrCPN6g6mSVh_c6-mkvURK5K9iuXrG8ZpFmTAygZLsCqPpt5dX6KndSVfHnJjqca5XmBt8qKnsmw4RRx7mMYnfa_HZ85tiuDo7jrLxwMoOLMzQQPU-VL0-MqlML40pcxN2kQZKGMEWfoOOJKRKnMojQSJkKxVjw2mf8ENoppYZ4B80UUxDrwtGsUz0wYa-FmCAmN1lqpSHXgTaOBRFnKcuqcMUkwdCFlJaSshJTVgZft0FnN03HXoL1GjUljaQkBPIFRotfrwItWjE5GmRNZmOmyTHAWxK1CRDjF01rr7SpeTLWtwu2AWLGHdgAReK9KivxHReQdhES9g-u-ai3n3y___D_ebhceeAQ76Ld3uAcbi_nS7CNoWqRdWBdj0YXND_3h-Zdu9esBr4Pe527lGSgZDc-PLv4CbzwjUw
linkProvider	Scholars Portal
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwED-NIbFJCEGBUdjAfE28hKWJEycPCKF9qGVrX9ikvgXHdiCoS8vSMu2f4m_kLnGyVQPe9hpbdqQ7n3_nu_sdwJvQmDRD59gJDboo3OXaSb2UU5BXaxlEqSepUHg4Cvsn_PM4GK_A76YWhtIqG5tYGWo9VfRGvkP3sEAw7_U-zn461DWKoqtNC41aLQ7NxTm6bOWHwR7K963nHewf7_Yd21XAUdz15w46AHHmZoKHqfKl6XEVSmF86cuYmzQIslDGeE_qOOJKRKnMojQSJsJhrXhsMh_XvQW3uY-2BM-PGF86eIi2ezV7EdqBcOfHaX72nvpYLN9514Ds9XzMtUUxkxfncjK5ctkd3Id7FqWyT7VaPYAVU3Rgy9Y4sG1mi5hIqMxahw7cGdo4fQfu1q-BrC5yeghzWqnmqmCDKySgzNK7fmNfzCnKOFcl-5VLtm-5qVkTXSgZbsiqDp55VZnF9uocwbxku1ON43mBn8qK-Mmw0RQR9CM4uRHJPIbVYlqYJ8B8EQWxDjztGsUzE8ZauBmCTaO1VipSXXjXSCRRlgydenJMEnSKSHgJCS8h4XXhVTt1VjOA_G3SZiPWxBqBMrlU2S68bIfx-FJMRhZmuigTXAURsRARLrFRa0G7ixdT1qxwuyCW9KOdQNTgyyNF_r2iCA9CIvXBfV-3mvTvn3_6_59_AWv94-FRcjQYHT6DdY9wDb2rh5uwOj9bmC1EZfP0eXUUGHy96bP3B9ruUog
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fb9MwED6NIQ0khKDAKGxgfoqXrGnixMkDQmhdtTJWIcGkvgXHdiCoS8rSMu1f46_jLnG6VQPe9hpbdqTvfL7z3X0H8DI0Js3QOXZCgy4Kd7l2Ui_lFOTVWgZR6kkqFD4ch_tH_MMkmKzB77YWhtIqW51YK2pdKnoj79E9LNCY9_q9zKZFfBoM381-OtRBiiKtbTuNRkQOzNkpum_V29EAsX7lecO9L7v7ju0w4Cju-nMHnYE4czPBw1T50vS5CqUwvvRlzE0aBFkoY7wzdRxxJaJUZlEaCRPhsFY8NpmP616D68L3Y0onFJNzZw8t737DZIQ6Iez9OM5Pdqinxer9d8movZybeWNRzOTZqZxOL1x8wztw21qs7H0jYndhzRQd2Lb1Duw1swVNBDCzmqIDG4c2Zt-BW83LIGsKnu7BnFZqeCvY6AIhKLNUr9_YZ3OMeOeqYr9yyfYsTzVrIw0Vww1Z3c0zr6u02KDJF8wrtltqHM8L_FTVJFCGjUu0pu_D0ZUg8wDWi7IwD4H5IgpiHXjaNYpnJoy1cDM0PI3WWqlIdeFNi0iiLDE69eeYJuggEXgJgZcQeF14vpw6a9hA_jZpq4U1sQqhSs7FtwvPlsN4lCk-IwtTLqoEV0HrWIgIl9hspGC5ixdTBq1wuyBW5GM5gWjCV0eK_HtNFx6ERPCD-75YStK_f_7R_3_-KWzgqUs-jsYHj-GmRyYOPbGHW7A-P1mYbTTQ5umT-iQw-HrVR-8PVo1Wyw
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VrQS98ChQlrbIPIS4pJuHYzvHqu2qIHVVCVYqp8ivQGCbrZosCH59x4k36rYguEXxxHYyM8o3mpnPAG-YtarA4DhgFkMUGlITqFhRl-Q1RqZCxdI1Cp9M2PGUfjhLz3wRTe3LKr-dl5dtEj925M5RNLKJCEenh-M7sM5ShNwDWJ9OTvc_t51D6LDoKFF3nQUUEUHHIIS-yEbtXO4sidX_zi0websm8t6iupC_fsrZ7NoPZ_wAxsutdnUm3_cWjdrTv2-wOP7zXR7CfQ85yX5nI49gzVabsOsbFshb4juSnIaId_VNuHvik-6PoXFPdkQT5P01Bk_iuVm_kI_2HBVU6pr8KCU58sTSZJkaqAkuQNrjN8u2rYocdgV-ZU0O5gbHywpv1S1rkyWTOcLfJzAdH306OA78YQ2BpmHSBBhXZUVYcMqUTqSNqGaS20QmMqNWpWnBZIbww2SCai6ULIQS3AocNppmtkiewqCaV_YZkISLNDNpbEKraWFZZnhYIFK0xhithR7Cu6Uqc-2ZzN2BGrMcIxqn9dx99dxpfQivetGLjr7jT0I7S3vIvQfXucN9HIPHOBrCy34Yfc8lVGRl54s6x1kQznIucIqtznz6VeLMlbzycAh8xbB6AcfrvTpSlV9bfu-UOUYeXPd1b4J_3_zz_5Lahg1ngG0TJduBQXO5sLuIphr1wvvSFSoPIeQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Artificial+Intelligence+Learning+Semantics+via+External+Resources+for+Classifying+Diagnosis+Codes+in+Discharge+Notes&rft.jtitle=Journal+of+medical+Internet+research&rft.au=Lin%2C+Chin&rft.au=Hsu%2C+Chia-Jung&rft.au=Lou%2C+Yu-Sheng&rft.au=Yeh%2C+Shih-Jen&rft.date=2017-11-06&rft.issn=1438-8871&rft.eissn=1438-8871&rft.volume=19&rft.issue=11&rft.spage=e380&rft_id=info:doi/10.2196%2Fjmir.8344&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1438-8871&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1438-8871&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1438-8871&client=summon