Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes

Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objectiv...

Full description

Saved in:
Bibliographic Details
Published inJournal of medical Internet research Vol. 19; no. 11; p. e380
Main Authors Lin, Chin, Hsu, Chia-Jung, Lou, Yu-Sheng, Yeh, Shih-Jen, Lee, Chia-Cheng, Su, Sui-Lung, Chen, Hsiang-Cheng
Format Journal Article
LanguageEnglish
Published Canada Gunther Eysenbach MD MPH, Associate Professor 06.11.2017
JMIR Publications
Subjects
Online AccessGet full text
ISSN1438-8871
1439-4456
1438-8871
DOI10.2196/jmir.8344

Cover

Abstract Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.
AbstractList Background: Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Objective: Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. Methods: We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. Results: In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Conclusions: Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.
Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN).BACKGROUNDAutomated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN).Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes.OBJECTIVEOur objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes.We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness.METHODSWe used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness.In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes.RESULTSIn 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes.Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.CONCLUSIONSWord embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.
Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data.
Author Lou, Yu-Sheng
Lee, Chia-Cheng
Yeh, Shih-Jen
Chen, Hsiang-Cheng
Hsu, Chia-Jung
Lin, Chin
Su, Sui-Lung
AuthorAffiliation 2 Department of Research and Development National Defense Medical Center Taipei Taiwan
1 School of Public Health National Defense Medical Center Taipei Taiwan
4 Da-Yeh University Changhua Taiwan
3 Planning and Management Office Tri-Service General Hospital National Defense Medical Center Taipei Taiwan
5 Division of Rheumatology/Immunology/Allergy, Department of Internal Medicine Tri-Service General Hospital National Defense Medical Center Taipei Taiwan
AuthorAffiliation_xml – name: 4 Da-Yeh University Changhua Taiwan
– name: 1 School of Public Health National Defense Medical Center Taipei Taiwan
– name: 3 Planning and Management Office Tri-Service General Hospital National Defense Medical Center Taipei Taiwan
– name: 2 Department of Research and Development National Defense Medical Center Taipei Taiwan
– name: 5 Division of Rheumatology/Immunology/Allergy, Department of Internal Medicine Tri-Service General Hospital National Defense Medical Center Taipei Taiwan
Author_xml – sequence: 1
  givenname: Chin
  orcidid: 0000-0003-2337-2096
  surname: Lin
  fullname: Lin, Chin
– sequence: 2
  givenname: Chia-Jung
  orcidid: 0000-0001-9969-4855
  surname: Hsu
  fullname: Hsu, Chia-Jung
– sequence: 3
  givenname: Yu-Sheng
  orcidid: 0000-0001-9115-2656
  surname: Lou
  fullname: Lou, Yu-Sheng
– sequence: 4
  givenname: Shih-Jen
  orcidid: 0000-0001-5393-3996
  surname: Yeh
  fullname: Yeh, Shih-Jen
– sequence: 5
  givenname: Chia-Cheng
  orcidid: 0000-0002-7450-504X
  surname: Lee
  fullname: Lee, Chia-Cheng
– sequence: 6
  givenname: Sui-Lung
  orcidid: 0000-0003-3122-1116
  surname: Su
  fullname: Su, Sui-Lung
– sequence: 7
  givenname: Hsiang-Cheng
  orcidid: 0000-0002-0753-6161
  surname: Chen
  fullname: Chen, Hsiang-Cheng
BackLink https://www.ncbi.nlm.nih.gov/pubmed/29109070$$D View this record in MEDLINE/PubMed
BookMark eNp9kVtv1DAQhS1URC_wwB9AkXihSNvaiZPYL0jVUqDSCiQuz9bEO069SuzFdgr77-toC5QK8WTL55szM8fH5MB5h4Q8Z_SsZLI534w2nImK80fkiPFKLIRo2cG9-yE5jnFDaUm5ZE_IYSkZlbSlRyRdhGSN1RaG4solHAbbo9NYrBCCs64vvuAILlkdixsLxeXPhMFl-DNGPwWNsTA-FMsBYrRmNxe8tdA7H20sln6ddevyU9TXEHosPvqE8Sl5bGCI-OzuPCHf3l1-XX5YrD69v1perBaa0yotGOPSUNPyptMVIOO6gRYrqEBy7OraNCCpYGspuG5FB0Z0okWR5bXmEk11Ql7vfSe3hd0PGAa1DXaEsFOMqjk6NUen5ugy_GYPb6duxLVGlwL8KfBg1d-Ks9eq9zeqbmRTC5YNXt0ZBP99wpjUmNfOiYJDP0WV27Gmalsx93r5AN3kMHOsUZU1K1tZN-Vs-OL-RL9H-fV7GTjdAzr4GAOa_653_oDVNkGyfl7GDv-ouAWkeL3R
CitedBy_id crossref_primary_10_1080_01605682_2018_1506559
crossref_primary_10_3390_ijerph18073839
crossref_primary_10_3390_jpm11080725
crossref_primary_10_2196_11461
crossref_primary_10_4018_IJSWIS_331033
crossref_primary_10_1093_jamia_ocab084
crossref_primary_10_2196_11966
crossref_primary_10_2196_14499
crossref_primary_10_2196_14971
crossref_primary_10_1186_s12911_020_1085_4
crossref_primary_10_2196_33799
crossref_primary_10_3390_healthcare9101298
crossref_primary_10_1016_j_imu_2023_101227
crossref_primary_10_2196_24594
crossref_primary_10_1097_MOG_0000000000000926
crossref_primary_10_1007_s00234_020_02420_0
crossref_primary_10_1016_j_cie_2022_108363
crossref_primary_10_1155_2022_6207054
crossref_primary_10_3390_jcm10010003
crossref_primary_10_1186_s12913_024_11761_y
crossref_primary_10_3390_computers10020024
crossref_primary_10_2196_10788
crossref_primary_10_1016_j_ijmedinf_2022_104714
crossref_primary_10_1016_j_eswa_2022_118997
crossref_primary_10_1016_j_ijmedinf_2023_105122
crossref_primary_10_2196_40534
Cites_doi 10.1186/s12911-015-0174-2
10.1155/2016/8313454
10.1136/amiajnl-2012-001409
10.1214/aos/1013203451
10.1016/j.amepre.2011.08.015
10.18637/jss.v025.i05
10.1007/s00180-008-0119-7
10.1109/5.726791
10.3115/v1/d14-1162
10.24095/hpcdp.35.4.02
10.32614/RJ-2011-014
10.3115/v1/P14-1062
10.1016/j.ijmedinf.2015.08.004
10.1197/jamia.M1345
10.1371/journal.pone.0170242
10.1145/2567948.2577348
10.4066/AMJ.2013.1654
10.3389/fnbot.2013.00021
10.33321/cdi.2008.32.42
10.1017/S0950268806007011
10.1109/4235.585893
10.1016/j.chb.2016.05.051
10.1016/j.ijmedinf.2014.06.009
ContentType Journal Article
Copyright 2017. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017.
Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017. 2017
Copyright_xml – notice: 2017. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017.
– notice: Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017. 2017
DBID AAYXX
CITATION
NPM
3V.
7QJ
7RV
7X7
7XB
8FI
8FJ
8FK
ABUWG
AFKRA
ALSLI
AZQEC
BENPR
CCPQU
CNYFK
DWQXO
E3H
F2A
FYUFA
GHDGH
K9.
KB0
M0S
M1O
NAPCQ
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQQKQ
PQUKI
PRQQA
7X8
5PM
ADTOC
UNPAY
DOI 10.2196/jmir.8344
DatabaseName CrossRef
PubMed
ProQuest Central (Corporate)
Applied Social Sciences Index & Abstracts (ASSIA)
Nursing & Allied Health Database
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Social Science Premium Collection
ProQuest Central Essentials
ProQuest Central (NIESG)
ProQuest One
Library & information science collection.
ProQuest Central
Library & Information Sciences Abstracts (LISA)
Library & Information Science Abstracts (LISA)
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Health & Medical Complete (Alumni)
Nursing & Allied Health Database (Alumni Edition)
Health & Medical Collection (Alumni Edition)
Library Science Database
Nursing & Allied Health Premium
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest One Social Sciences
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
PubMed
Publicly Available Content Database
ProQuest One Academic Middle East (New)
Library and Information Science Abstracts (LISA)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
Applied Social Sciences Index and Abstracts (ASSIA)
ProQuest Central
ProQuest Library Science
ProQuest Health & Medical Research Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Library & Information Science Collection
ProQuest Central (New)
Social Science Premium Collection
ProQuest One Social Sciences
ProQuest One Academic Eastern Edition
ProQuest Nursing & Allied Health Source
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
ProQuest Hospital Collection (Alumni)
Nursing & Allied Health Premium
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
ProQuest Nursing & Allied Health Source (Alumni)
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList Publicly Available Content Database
MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 3
  dbid: BENPR
  name: ProQuest Central
  url: http://www.proquest.com/pqcentral?accountid=15518
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Library & Information Science
Public Health
EISSN 1438-8871
ExternalDocumentID 10.2196/jmir.8344
PMC5696581
29109070
10_2196_jmir_8344
Genre Journal Article
GeographicLocations Taiwan
GeographicLocations_xml – name: Taiwan
GroupedDBID ---
.4I
.DC
29L
2WC
36B
53G
5GY
5VS
77I
77K
7RV
7X7
8FI
8FJ
AAFWJ
AAKPC
AAWTL
AAYXX
ABDBF
ABIVO
ABUWG
ACGFO
ADBBV
ADRAZ
AEGXH
AENEX
AFKRA
AFPKN
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ALSLI
AOIJS
BAWUL
BCNDV
BENPR
CCPQU
CITATION
CNYFK
CS3
DIK
DU5
DWQXO
E3Z
EAP
EBD
EBS
EJD
ELW
EMB
EMOBN
ESX
F5P
FRP
FYUFA
GROUPED_DOAJ
GX1
HMCUK
HYE
IAO
ICO
IEA
IHR
INH
ISN
ITC
KQ8
M1O
M48
NAPCQ
OK1
OVT
P2P
PGMZT
PHGZM
PHGZT
PIMPY
PPXIY
PQQKQ
PRQQA
PUEGO
RNS
RPM
SJN
SV3
TR2
UKHRP
XSB
ALIPV
NPM
3V.
7QJ
7XB
8FK
AZQEC
E3H
F2A
K9.
PJZUB
PKEHL
PQEST
PQUKI
7X8
5PM
ADTOC
C1A
O5R
O5S
UNPAY
WOQ
ID FETCH-LOGICAL-c403t-1149f0f746bc3ae14c6a7e3a3a94eb55f6a9081d984c78baf8b87e83a9dc49ef3
IEDL.DBID M48
ISSN 1438-8871
1439-4456
IngestDate Sun Oct 26 03:36:34 EDT 2025
Tue Sep 30 16:56:05 EDT 2025
Thu Sep 04 17:19:01 EDT 2025
Tue Oct 07 06:30:12 EDT 2025
Wed Feb 19 02:43:16 EST 2025
Wed Oct 01 06:00:07 EDT 2025
Thu Apr 24 22:59:34 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords electronic medical records
neural networks (computer)
natural language processing
word embedding
convolutional neural network
data mining
text mining
electronic health records
machine learning
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c403t-1149f0f746bc3ae14c6a7e3a3a94eb55f6a9081d984c78baf8b87e83a9dc49ef3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0001-9115-2656
0000-0001-5393-3996
0000-0001-9969-4855
0000-0002-7450-504X
0000-0003-2337-2096
0000-0003-3122-1116
0000-0002-0753-6161
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.2196/jmir.8344
PMID 29109070
PQID 2512795621
PQPubID 2033121
ParticipantIDs unpaywall_primary_10_2196_jmir_8344
pubmedcentral_primary_oai_pubmedcentral_nih_gov_5696581
proquest_miscellaneous_1961637784
proquest_journals_2512795621
pubmed_primary_29109070
crossref_primary_10_2196_jmir_8344
crossref_citationtrail_10_2196_jmir_8344
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20171106
PublicationDateYYYYMMDD 2017-11-06
PublicationDate_xml – month: 11
  year: 2017
  text: 20171106
  day: 6
PublicationDecade 2010
PublicationPlace Canada
PublicationPlace_xml – name: Canada
– name: Toronto
– name: Toronto, Canada
PublicationTitle Journal of medical Internet research
PublicationTitleAlternate J Med Internet Res
PublicationYear 2017
Publisher Gunther Eysenbach MD MPH, Associate Professor
JMIR Publications
Publisher_xml – name: Gunther Eysenbach MD MPH, Associate Professor
– name: JMIR Publications
References ref13
ref35
ref12
ref34
ref15
Muscatello, DJ (ref33) 2008; 32
ref37
ref14
ref31
ref30
ref11
ref10
ref32
ref2
ref1
ref17
ref19
ref18
Srivastava, N (ref29) 2014
Mitra, D (ref36) 2015; 35
Friedman, J (ref25) 2001; 29
Hornik, K (ref16) 2010; 3
ref24
ref23
ref26
ref20
ref22
ref21
Koopman, B (ref4) 2015; 2015
ref28
Bengio, Y (ref8) 2003; 3
ref27
ref7
ref9
ref3
ref6
ref5
References_xml – ident: ref37
– ident: ref3
  doi: 10.1186/s12911-015-0174-2
– ident: ref5
  doi: 10.1155/2016/8313454
– ident: ref31
  doi: 10.1136/amiajnl-2012-001409
– volume: 29
  start-page: 1189
  issue: 5
  year: 2001
  ident: ref25
  publication-title: Ann Stat
  doi: 10.1214/aos/1013203451
– ident: ref1
  doi: 10.1016/j.amepre.2011.08.015
– ident: ref19
  doi: 10.18637/jss.v025.i05
– ident: ref18
  doi: 10.1007/s00180-008-0119-7
– ident: ref12
  doi: 10.1109/5.726791
– ident: ref11
  doi: 10.3115/v1/d14-1162
– volume: 35
  start-page: 73
  issue: 4
  year: 2015
  ident: ref36
  publication-title: Health Promot Chronic Dis Prev Can
  doi: 10.24095/hpcdp.35.4.02
– start-page: 1929
  year: 2014
  ident: ref29
  publication-title: J Mach Learn Res
– volume: 3
  start-page: 22
  issue: 2
  year: 2010
  ident: ref16
  publication-title: The R Journal
  doi: 10.32614/RJ-2011-014
– ident: ref27
  doi: 10.3115/v1/P14-1062
– ident: ref9
– ident: ref2
  doi: 10.1016/j.ijmedinf.2015.08.004
– ident: ref32
– volume: 3
  start-page: 1137
  year: 2003
  ident: ref8
  publication-title: J Mach Learn Res
– ident: ref15
  doi: 10.3115/v1/d14-1162
– ident: ref17
– ident: ref35
  doi: 10.1197/jamia.M1345
– ident: ref6
  doi: 10.1371/journal.pone.0170242
– ident: ref14
  doi: 10.1145/2567948.2577348
– ident: ref30
– ident: ref13
– ident: ref20
  doi: 10.4066/AMJ.2013.1654
– ident: ref24
  doi: 10.3389/fnbot.2013.00021
– ident: ref28
– ident: ref21
– volume: 32
  start-page: 435
  issue: 4
  year: 2008
  ident: ref33
  publication-title: Commun Dis Intell Q Rep
  doi: 10.33321/cdi.2008.32.42
– ident: ref34
  doi: 10.1017/S0950268806007011
– ident: ref23
– ident: ref26
  doi: 10.1109/4235.585893
– ident: ref10
– ident: ref22
  doi: 10.1016/j.chb.2016.05.051
– volume: 2015
  start-page: 775
  year: 2015
  ident: ref4
  publication-title: AMIA Annu Symp Proc
– ident: ref7
  doi: 10.1016/j.ijmedinf.2014.06.009
SSID ssj0020491
Score 2.3843849
Snippet Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language...
Background: Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural...
SourceID unpaywall
pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e380
SubjectTerms Algorithms
Artificial intelligence
Automation
Big Data
Circulatory system
Classification
Clinical information
Codes
Dictionaries
Disease
Forests
Health care
Health insurance
Health surveillance
Hospitalized
Hospitals
Intelligence
Machine learning
Medical diagnosis
Medical records
Neural networks
Original Paper
Pipelines
Public health
Semantics
Simulation
Streptococcus infections
Surveillance
Tumors
Validity
SummonAdditionalLinks – databaseName: Library Science Database
  dbid: M1O
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwEB5BkQAJ8VheCy0yD6Fess0mjp2cECqtCtKWA1TqLfIrENgmS5MFwa9nJvGGLgUOXGPLdjRj-xvPzDcAz4RzukDjOBAOTRQechvoSHNy8lqrklRHihKFZ4fi4Ii_OU6O_YNb48MqV2did1Db2tAb-Q7dwxLBfDR9sfgSUNUo8q76EhoX4RI5H6mCwWz6djC4EP1OezYh3Jdi59NJeTqhuhLrd9A5YHk-PvLKslqo79_UfH7m8tm_Aflq2X3MyefJstUT8-M3Rsf__6-bcN3jUvayV6RbcMFVI9jyWQ3sOfNpSyRG5s-DEVyeec_8CK7173-sT2u6DS2N1LNTsNdnaD-ZJ3T9wN65E5RqaRr2tVRsz7NRs5U_oWE4IetqdpZdLhZ71UcFlg3brS22lxV-ajqqJ8cOa8TMd-Bof-_97kHgKzwEhodxG6AxlhVhIbnQJlZuyo1Q0sUqVhl3OkkKoTLELDZLuZGpVkWqU-lSbLaGZ66I78JGVVfuPrBYpklmk8iGzvDCiczKsEB46ay1xqRmDNsrmefG059TFY55jmYQqUdO6pGTeozhydB10XN-_KnT5kqeud_2Tf5LmGN4PDTjhiUvjKpcvWxyHAUxsJQpDnGv17NhliijOFkZjkGuaeDQgcjA11uq8mNHCp4IovHBeZ8Ouvr3xT_49-IfwtWI0Au9notN2GhPl24LsVerH3Ub7CfgYDeO
  priority: 102
  providerName: ProQuest
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VrQS98ChQlrbIPIS4pJuHYzvHqu2qIHVVCVYqp8ivQGCbrZosCH59x4k36rYguEXxxHYyM8o3mpnPAG-YtarA4DhgFkMUGlITqFhRl-Q1RqZCxdI1Cp9M2PGUfjhLz3wRTe3LKr-dl5dtEj925M5RNLKJCEenh-M7sM5ShNwDWJ9OTvc_t51D6LDoKFF3nQUUEUHHIIS-yEbtXO4sidX_zi0websm8t6iupC_fsrZ7NoPZ_wAxsutdnUm3_cWjdrTv2-wOP7zXR7CfQ85yX5nI49gzVabsOsbFshb4juSnIaId_VNuHvik-6PoXFPdkQT5P01Bk_iuVm_kI_2HBVU6pr8KCU58sTSZJkaqAkuQNrjN8u2rYocdgV-ZU0O5gbHywpv1S1rkyWTOcLfJzAdH306OA78YQ2BpmHSBBhXZUVYcMqUTqSNqGaS20QmMqNWpWnBZIbww2SCai6ULIQS3AocNppmtkiewqCaV_YZkISLNDNpbEKraWFZZnhYIFK0xhithR7Cu6Uqc-2ZzN2BGrMcIxqn9dx99dxpfQivetGLjr7jT0I7S3vIvQfXucN9HIPHOBrCy34Yfc8lVGRl54s6x1kQznIucIqtznz6VeLMlbzycAh8xbB6AcfrvTpSlV9bfu-UOUYeXPd1b4J_3_zz_5Lahg1ngG0TJduBQXO5sLuIphr1wvvSFSoPIeQ
  priority: 102
  providerName: Unpaywall
Title Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes
URI https://www.ncbi.nlm.nih.gov/pubmed/29109070
https://www.proquest.com/docview/2512795621
https://www.proquest.com/docview/1961637784
https://pubmed.ncbi.nlm.nih.gov/PMC5696581
https://www.jmir.org/2017/11/e380/PDF
UnpaywallVersion publishedVersion
Volume 19
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: KQ8
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: DOA
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: ABDBF
  dateStart: 20050101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: DIK
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: GX1
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: RPM
  dateStart: 19990101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: Library Science Database
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: M1O
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/libraryscience
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: BENPR
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Health & Medical Collection
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: 7X7
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1438-8871
  dateEnd: 20250131
  omitProxy: true
  ssIdentifier: ssj0020491
  issn: 1439-4456
  databaseCode: M48
  dateStart: 20100201
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED_tQwJeJja-yrbKfGjiJSNNnDh5QGhsnQZSywRU6p4ix3YgqEu7pgX233OXONGqDYmXPOQcO8rdyb_LnX8H8Do0Js0wOHZCgyEKd7l2Ui_llOTVWgZR6kk6KDwYhmcj_mkcjNegKWu2H7C8M7SjflKj-eTwz9X1e3T4d1TGjAb09udlPj-kfhEHsyuH-klR3tU211iHTdyzYmrqMOBtfsFDXFyFYhzdHd2sV3MOrc62ulPdgp-3qyjvL4uZvP4tJ5MbW9TpQ9iy2JId1cawDWum2IF9ezKBHTB79IhUwaxP78C9gc2uP4IFPVkzSrCPN6g6mSVh_c6-mkvURK5K9iuXrG8ZpFmTAygZLsCqPpt5dX6KndSVfHnJjqca5XmBt8qKnsmw4RRx7mMYnfa_HZ85tiuDo7jrLxwMoOLMzQQPU-VL0-MqlML40pcxN2kQZKGMEWfoOOJKRKnMojQSJkKxVjw2mf8ENoppYZ4B80UUxDrwtGsUz0wYa-FmCAmN1lqpSHXgTaOBRFnKcuqcMUkwdCFlJaSshJTVgZft0FnN03HXoL1GjUljaQkBPIFRotfrwItWjE5GmRNZmOmyTHAWxK1CRDjF01rr7SpeTLWtwu2AWLGHdgAReK9KivxHReQdhES9g-u-ai3n3y___D_ebhceeAQ76Ld3uAcbi_nS7CNoWqRdWBdj0YXND_3h-Zdu9esBr4Pe527lGSgZDc-PLv4CbzwjUw
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwED-NIbFJCEGBUdjAfE28hKWJEycPCKF9qGVrX9ikvgXHdiCoS8vSMu2f4m_kLnGyVQPe9hpbdqQ7n3_nu_sdwJvQmDRD59gJDboo3OXaSb2UU5BXaxlEqSepUHg4Cvsn_PM4GK_A76YWhtIqG5tYGWo9VfRGvkP3sEAw7_U-zn461DWKoqtNC41aLQ7NxTm6bOWHwR7K963nHewf7_Yd21XAUdz15w46AHHmZoKHqfKl6XEVSmF86cuYmzQIslDGeE_qOOJKRKnMojQSJsJhrXhsMh_XvQW3uY-2BM-PGF86eIi2ezV7EdqBcOfHaX72nvpYLN9514Ds9XzMtUUxkxfncjK5ctkd3Id7FqWyT7VaPYAVU3Rgy9Y4sG1mi5hIqMxahw7cGdo4fQfu1q-BrC5yeghzWqnmqmCDKySgzNK7fmNfzCnKOFcl-5VLtm-5qVkTXSgZbsiqDp55VZnF9uocwbxku1ON43mBn8qK-Mmw0RQR9CM4uRHJPIbVYlqYJ8B8EQWxDjztGsUzE8ZauBmCTaO1VipSXXjXSCRRlgydenJMEnSKSHgJCS8h4XXhVTt1VjOA_G3SZiPWxBqBMrlU2S68bIfx-FJMRhZmuigTXAURsRARLrFRa0G7ixdT1qxwuyCW9KOdQNTgyyNF_r2iCA9CIvXBfV-3mvTvn3_6_59_AWv94-FRcjQYHT6DdY9wDb2rh5uwOj9bmC1EZfP0eXUUGHy96bP3B9ruUog
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fb9MwED6NIQ0khKDAKGxgfoqXrGnixMkDQmhdtTJWIcGkvgXHdiCoS8rSMu1f46_jLnG6VQPe9hpbdqTvfL7z3X0H8DI0Js3QOXZCgy4Kd7l2Ui_lFOTVWgZR6kkqFD4ch_tH_MMkmKzB77YWhtIqW51YK2pdKnoj79E9LNCY9_q9zKZFfBoM381-OtRBiiKtbTuNRkQOzNkpum_V29EAsX7lecO9L7v7ju0w4Cju-nMHnYE4czPBw1T50vS5CqUwvvRlzE0aBFkoY7wzdRxxJaJUZlEaCRPhsFY8NpmP616D68L3Y0onFJNzZw8t737DZIQ6Iez9OM5Pdqinxer9d8movZybeWNRzOTZqZxOL1x8wztw21qs7H0jYndhzRQd2Lb1Duw1swVNBDCzmqIDG4c2Zt-BW83LIGsKnu7BnFZqeCvY6AIhKLNUr9_YZ3OMeOeqYr9yyfYsTzVrIw0Vww1Z3c0zr6u02KDJF8wrtltqHM8L_FTVJFCGjUu0pu_D0ZUg8wDWi7IwD4H5IgpiHXjaNYpnJoy1cDM0PI3WWqlIdeFNi0iiLDE69eeYJuggEXgJgZcQeF14vpw6a9hA_jZpq4U1sQqhSs7FtwvPlsN4lCk-IwtTLqoEV0HrWIgIl9hspGC5ixdTBq1wuyBW5GM5gWjCV0eK_HtNFx6ERPCD-75YStK_f_7R_3_-KWzgqUs-jsYHj-GmRyYOPbGHW7A-P1mYbTTQ5umT-iQw-HrVR-8PVo1Wyw
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VrQS98ChQlrbIPIS4pJuHYzvHqu2qIHVVCVYqp8ivQGCbrZosCH59x4k36rYguEXxxHYyM8o3mpnPAG-YtarA4DhgFkMUGlITqFhRl-Q1RqZCxdI1Cp9M2PGUfjhLz3wRTe3LKr-dl5dtEj925M5RNLKJCEenh-M7sM5ShNwDWJ9OTvc_t51D6LDoKFF3nQUUEUHHIIS-yEbtXO4sidX_zi0websm8t6iupC_fsrZ7NoPZ_wAxsutdnUm3_cWjdrTv2-wOP7zXR7CfQ85yX5nI49gzVabsOsbFshb4juSnIaId_VNuHvik-6PoXFPdkQT5P01Bk_iuVm_kI_2HBVU6pr8KCU58sTSZJkaqAkuQNrjN8u2rYocdgV-ZU0O5gbHywpv1S1rkyWTOcLfJzAdH306OA78YQ2BpmHSBBhXZUVYcMqUTqSNqGaS20QmMqNWpWnBZIbww2SCai6ULIQS3AocNppmtkiewqCaV_YZkISLNDNpbEKraWFZZnhYIFK0xhithR7Cu6Uqc-2ZzN2BGrMcIxqn9dx99dxpfQivetGLjr7jT0I7S3vIvQfXucN9HIPHOBrCy34Yfc8lVGRl54s6x1kQznIucIqtznz6VeLMlbzycAh8xbB6AcfrvTpSlV9bfu-UOUYeXPd1b4J_3_zz_5Lahg1ngG0TJduBQXO5sLuIphr1wvvSFSoPIeQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Artificial+Intelligence+Learning+Semantics+via+External+Resources+for+Classifying+Diagnosis+Codes+in+Discharge+Notes&rft.jtitle=Journal+of+medical+Internet+research&rft.au=Lin%2C+Chin&rft.au=Hsu%2C+Chia-Jung&rft.au=Lou%2C+Yu-Sheng&rft.au=Yeh%2C+Shih-Jen&rft.date=2017-11-06&rft.issn=1438-8871&rft.eissn=1438-8871&rft.volume=19&rft.issue=11&rft.spage=e380&rft_id=info:doi/10.2196%2Fjmir.8344&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1438-8871&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1438-8871&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1438-8871&client=summon