Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning

Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. To develop an algorithm to identify relevant free texts automatically based on labelled examples. We developed a novel machine...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 7; no. 1; p. e30412
Main Authors Wang, Zhuoran, Shah, Anoop D., Tate, A. Rosemary, Denaxas, Spiros, Shawe-Taylor, John, Hemingway, Harry
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 19.01.2012
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0030412

Cover

Abstract Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. To develop an algorithm to identify relevant free texts automatically based on labelled examples. We developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language processor. Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%). Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
AbstractList Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. To develop an algorithm to identify relevant free texts automatically based on labelled examples. We developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language processor. Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%). Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
Background Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. Aim To develop an algorithm to identify relevant free texts automatically based on labelled examples. Methods We developed a novel machine learning algorithm, the ‘Semi-supervised Set Covering Machine’ (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our ‘Freetext Matching Algorithm’ natural language processor. Results Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%). Conclusions Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
BackgroundElectronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually.AimTo develop an algorithm to identify relevant free texts automatically based on labelled examples.MethodsWe developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language processor.ResultsOnly 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%).ConclusionsOur novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. To develop an algorithm to identify relevant free texts automatically based on labelled examples. We developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language processor. Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%). Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually.BACKGROUNDElectronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually.To develop an algorithm to identify relevant free texts automatically based on labelled examples.AIMTo develop an algorithm to identify relevant free texts automatically based on labelled examples.We developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language processor.METHODSWe developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language processor.Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%).RESULTSOnly 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%).Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.CONCLUSIONSOur novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
Background Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. Aim To develop an algorithm to identify relevant free texts automatically based on labelled examples. Methods We developed a novel machine learning algorithm, the ‘Semi-supervised Set Covering Machine’ (S3CM), and tested its ability to detect the presence of coronary angiogram results and ovarian cancer diagnoses in free text in the General Practice Research Database. For training the algorithm, we used texts classified as positive and negative according to their associated Read diagnostic codes, rather than by manual annotation. We evaluated the precision (positive predictive value) and recall (sensitivity) of S3CM in classifying unlabelled texts against the gold standard of manual review. We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our ‘Freetext Matching Algorithm’ natural language processor. Results Only 60% of texts with Read codes for angiogram actually contained angiogram results. However, the S3CM algorithm achieved 87% recall with 64% precision on detecting coronary angiogram results, outperforming the fully-supervised SCM (recall 78%, precision 60%) and TSVM (recall 2%, precision 3%). For ovarian cancer diagnoses, S3CM had higher recall than the other algorithms tested (86%). The Freetext Matching Algorithm had better precision than S3CM (85% versus 74%) but lower recall (62%). Conclusions Our novel S3CM machine learning algorithm effectively detected free texts in primary care records associated with angiogram results and ovarian cancer diagnoses, after training on pre-classified test sets. It should be easy to adapt to other disease areas as it does not rely on linguistic rules, but needs further testing in other electronic health record datasets.
Audience Academic
Author Hemingway, Harry
Shawe-Taylor, John
Shah, Anoop D.
Tate, A. Rosemary
Wang, Zhuoran
Denaxas, Spiros
AuthorAffiliation Dana-Farber Cancer Institute, United States of America
1 Department of Computer Science, University College London, London, United Kingdom
4 Department of Informatics, University of Sussex, Brighton, United Kingdom
3 Clinical Epidemiology Group, Department of Epidemiology and Public Health, University College London, London, United Kingdom
2 School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, United Kingdom
AuthorAffiliation_xml – name: 3 Clinical Epidemiology Group, Department of Epidemiology and Public Health, University College London, London, United Kingdom
– name: 4 Department of Informatics, University of Sussex, Brighton, United Kingdom
– name: Dana-Farber Cancer Institute, United States of America
– name: 1 Department of Computer Science, University College London, London, United Kingdom
– name: 2 School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, United Kingdom
Author_xml – sequence: 1
  givenname: Zhuoran
  surname: Wang
  fullname: Wang, Zhuoran
– sequence: 2
  givenname: Anoop D.
  surname: Shah
  fullname: Shah, Anoop D.
– sequence: 3
  givenname: A. Rosemary
  surname: Tate
  fullname: Tate, A. Rosemary
– sequence: 4
  givenname: Spiros
  surname: Denaxas
  fullname: Denaxas, Spiros
– sequence: 5
  givenname: John
  surname: Shawe-Taylor
  fullname: Shawe-Taylor, John
– sequence: 6
  givenname: Harry
  surname: Hemingway
  fullname: Hemingway, Harry
BackLink https://www.ncbi.nlm.nih.gov/pubmed/22276193$$D View this record in MEDLINE/PubMed
BookMark eNqNk9tq3DAQhk1JaQ7tG5TWUGjpxW51sGW7F4WQps1CSiCH3gpZGnsVtNJGkrfJC_S5q81uQjYEGnxhM_rm98z8o91syzoLWfYWozGmFf5y6QZvhRnPU3iMEEUFJi-yHdxQMmIE0a0H39vZbgiXCJW0ZuxVtk0IqVg63cn-Hl5HL2TUts-_a9FbFyDkwqp8YhcQou5F1M7mpxAGE0PeeTfLL2yIfpBx8KDyc7iOubb5oQEZvbNa5kcgTJymHOm8Cnl7k5_BTI_Ohjn4hQ4p6ZeQU20hPwbhbfr36-xlJ0yAN-v3Xnbx4_D84Gh0fPJzcrB_PJKswXGkyrpQrADEWswo6xrEBLQEQ1NVLS4RA0JYzVraVk0tRFt3BWuJqikjrQICdC97v9KdGxf4eoSBY0ooquqypomYrAjlxCWfez0T_oY7ofltwPmeCx-1NMAVrqAB1jIliqItqgbqppS0LEWBlCpI0ipXWoOdi5s_wph7QYz40sW7EvjSRb52MeV9W1c5tDNQEmzyyGwUs3li9ZT3bsEpSa6iJgl8Wgt4dzUkG_lMBwnGCAtuCLzBNaIVKlkiPzwinx7KmupF6lvbzi2XZqnJ94uqQoxhtix7_ASVHpXcl6nBTqf4RsLnjYTExLRNvRhC4JOz0-ezJ7832Y8P2OntNgZnhuUmh03w3cNB30_47n4k4OsKkN6F4KHjUsfbG5Fa0-Z_NhaPkp_l_j9WbDlu
CitedBy_id crossref_primary_10_3390_app11020865
crossref_primary_10_1186_s12911_025_02897_w
crossref_primary_10_1007_s13167_019_00188_9
crossref_primary_10_1186_1472_6947_12_88
crossref_primary_10_2196_14330
crossref_primary_10_1109_TCBB_2018_2849968
crossref_primary_10_1016_j_trip_2020_100176
crossref_primary_10_1186_s12873_018_0188_z
crossref_primary_10_1109_RBME_2020_3013489
crossref_primary_10_4103_ijcm_ijcm_806_24
crossref_primary_10_3389_fepid_2022_871630
crossref_primary_10_1016_j_injury_2014_11_012
crossref_primary_10_1161_JAHA_119_013924
crossref_primary_10_1186_s13040_016_0109_1
crossref_primary_10_1177_15347346211041866
crossref_primary_10_1093_fampra_cmu009
crossref_primary_10_1016_j_cosrev_2021_100370
crossref_primary_10_1016_j_jbi_2022_104147
crossref_primary_10_1097_AOG_0000000000004706
crossref_primary_10_1155_2022_1833507
crossref_primary_10_1093_ehjqcco_qcv005
crossref_primary_10_1109_JBHI_2020_2977925
crossref_primary_10_1109_JBHI_2014_2361688
crossref_primary_10_1186_s12917_016_0861_y
crossref_primary_10_1542_peds_2013_3232
crossref_primary_10_1016_j_jbi_2018_04_005
crossref_primary_10_1371_journal_pone_0110900
crossref_primary_10_2147_JAA_S285742
crossref_primary_10_1016_j_jbi_2014_04_001
crossref_primary_10_1002_pds_3856
crossref_primary_10_1371_journal_pone_0074262
crossref_primary_10_1093_eurheartj_ehx487
crossref_primary_10_1017_S0033291719000151
crossref_primary_10_1111_exsy_12388
crossref_primary_10_1371_journal_pone_0107797
crossref_primary_10_1109_ACCESS_2020_3012082
crossref_primary_10_1136_bmjopen_2014_007355
crossref_primary_10_1186_1472_6947_13_30
crossref_primary_10_1007_s41109_021_00395_2
crossref_primary_10_1371_journal_pone_0136270
crossref_primary_10_3389_fmed_2019_00036
crossref_primary_10_2196_33799
crossref_primary_10_1177_03611981211002523
crossref_primary_10_1145_3490234
crossref_primary_10_1109_JBHI_2021_3134835
crossref_primary_10_1186_s13326_019_0214_4
crossref_primary_10_1007_s41870_022_00970_5
crossref_primary_10_1097_MOL_0000000000000554
crossref_primary_10_1016_j_bspc_2024_106160
crossref_primary_10_3310_pgfar05040
crossref_primary_10_1302_2046_3758_73_BJR_2017_0147_R1
crossref_primary_10_1093_tse_tdaa010
crossref_primary_10_1016_j_artmed_2023_102625
crossref_primary_10_1155_2021_6663884
crossref_primary_10_1177_0962280219837676
crossref_primary_10_2196_16760
crossref_primary_10_2139_ssrn_4049602
crossref_primary_10_1002_pds_4681
crossref_primary_10_1371_journal_pone_0226272
crossref_primary_10_1055_s_0041_1733945
crossref_primary_10_1093_jamia_ocy094
Cites_doi 10.1016/j.jbi.2004.11.016
10.1016/j.ijmedinf.2009.02.003
10.1186/1471-2288-9-42
10.1111/j.1365-2125.2009.03537.x
10.1197/jamia.M2442
10.1197/jamia.M1552
10.1197/jamia.M2437
10.1136/bmjopen-2010-000025
10.1016/j.ahj.2006.12.022
10.1371/journal.pone.0013377
ContentType Journal Article
Copyright COPYRIGHT 2012 Public Library of Science
2012 Wang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Wang et al. 2012
Copyright_xml – notice: COPYRIGHT 2012 Public Library of Science
– notice: 2012 Wang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: Wang et al. 2012
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
IOV
ISR
3V.
7QG
7QL
7QO
7RV
7SN
7SS
7T5
7TG
7TM
7U9
7X2
7X7
7XB
88E
8AO
8C1
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABJCF
ABUWG
AEUYN
AFKRA
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
C1K
CCPQU
D1I
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
H94
HCIFZ
K9.
KB.
KB0
KL.
L6V
LK8
M0K
M0S
M1P
M7N
M7P
M7S
NAPCQ
P5Z
P62
P64
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
PYCSY
RC3
7X8
5PM
ADTOC
UNPAY
DOA
DOI 10.1371/journal.pone.0030412
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Opposing Viewpoints (Gale)
Gale In Context: Science
ProQuest Central (Corporate)
Animal Behavior Abstracts
Bacteriology Abstracts (Microbiology B)
Biotechnology Research Abstracts
Nursing & Allied Health Database (ProQuest)
Ecology Abstracts
Entomology Abstracts (Full archive)
Immunology Abstracts
Meteorological & Geoastrophysical Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Agricultural Science Collection
Health & Medical Collection (ProQuest)
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest Pharma Collection
Public Health Database
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Journals
ProQuest Hospital Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest One Sustainability
ProQuest Central
Advanced Technologies & Computer Science Collection
Agricultural & Environmental Science Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Technology Collection
Natural Science Collection
Environmental Sciences and Pollution Management
ProQuest One Community College
ProQuest Materials Science Collection
ProQuest Central Korea
Engineering Research Database
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
AIDS and Cancer Research Abstracts
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Materials Science Database (ProQuest)
Nursing & Allied Health Database (Alumni Edition)
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest Engineering Collection
Biological Sciences
Agriculture Science Database
Health & Medical Collection (Alumni Edition)
ProQuest Medical Database
Algology Mycology and Protozoology Abstracts (Microbiology C)
Biological Science Database (ProQuest)
Engineering Database
Nursing & Allied Health Premium
Advanced Technologies & Aerospace Database (ProQuest)
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
Environmental Science Database (ProQuest)
Materials Science Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
Environmental Science Collection
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
Directory of Open Access Journals - DOAJ (NTUSG)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Agricultural Science Database
Publicly Available Content Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Nucleic Acids Abstracts
SciTech Premium Collection
ProQuest Central China
Environmental Sciences and Pollution Management
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Meteorological & Geoastrophysical Abstracts
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
Virology and AIDS Abstracts
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Agricultural Science Collection
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
Ecology Abstracts
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Environmental Science Collection
Entomology Abstracts
Nursing & Allied Health Premium
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Environmental Science Database
ProQuest Nursing & Allied Health Source (Alumni)
Engineering Research Database
ProQuest One Academic
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest One Academic (New)
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
Materials Science Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Genetics Abstracts
ProQuest Engineering Collection
Biotechnology Research Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Bacteriology Abstracts (Microbiology B)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Agricultural & Environmental Science Collection
AIDS and Cancer Research Abstracts
Materials Science Database
ProQuest Materials Science Collection
ProQuest Public Health
ProQuest Nursing & Allied Health Source
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest Medical Library
Animal Behavior Abstracts
Materials Science & Engineering Collection
Immunology Abstracts
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList
Agricultural Science Database



MEDLINE
MEDLINE - Academic



Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 4
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 5
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
Public Health
Computer Science
Medicine
Biology
Mathematics
DocumentTitleAlternate Machine Learning to Extract Information from Text
EISSN 1932-6203
ExternalDocumentID 1323078583
oai_doaj_org_article_d17e9e6b6da44b479e895c355a40dd42
10.1371/journal.pone.0030412
PMC3261909
2935099161
A477066162
22276193
10_1371_journal_pone_0030412
Genre Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: Wellcome Trust
  grantid: 093830
– fundername: Wellcome Trust
  grantid: 0938/30/Z/10/Z
– fundername: Department of Health
  grantid: RP-PG-0407-10314
– fundername: Wellcome Trust
  grantid: 086091/Z/08/Z
GroupedDBID ---
123
29O
2WC
53G
5VS
7RV
7X2
7X7
7XC
88E
8AO
8C1
8CJ
8FE
8FG
8FH
8FI
8FJ
A8Z
AAFWJ
AAUCC
AAWOE
AAYXX
ABDBF
ABIVO
ABJCF
ABUWG
ACGFO
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADRAZ
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHMBA
ALMA_UNASSIGNED_HOLDINGS
AOIJS
APEBS
ARAPS
ATCPS
BAWUL
BBNVY
BCNDV
BENPR
BGLVJ
BHPHI
BKEYQ
BPHCQ
BVXVI
BWKFM
CCPQU
CITATION
CS3
D1I
D1J
D1K
DIK
DU5
E3Z
EAP
EAS
EBD
EMOBN
ESTFP
ESX
EX3
F5P
FPL
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HH5
HMCUK
HYE
IAO
IEA
IGS
IHR
IHW
INH
INR
IOV
IPNFZ
IPY
ISE
ISR
ITC
K6-
KB.
KQ8
L6V
LK5
LK8
M0K
M1P
M48
M7P
M7R
M7S
M~E
NAPCQ
O5R
O5S
OK1
OVT
P2P
P62
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PTHSS
PUEGO
PYCSY
RIG
RNS
RPM
SV3
TR2
UKHRP
WOQ
WOW
~02
~KM
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
PV9
RZL
BBORY
3V.
7QG
7QL
7QO
7SN
7SS
7T5
7TG
7TM
7U9
7XB
8FD
8FK
ACCTH
AFFHD
AZQEC
C1K
DWQXO
FR3
GNUQQ
H94
K9.
KL.
M7N
P64
PKEHL
PQEST
PQUKI
PRINS
RC3
7X8
5PM
ADTOC
UNPAY
AAPBV
ABPTK
BBAFP
N95
ID FETCH-LOGICAL-c691t-d584d64e06b1636f906aeb21e977b1506e22686b3b798aab8f46b2d8362bde2e3
IEDL.DBID M48
ISSN 1932-6203
IngestDate Sun Jan 01 07:45:41 EST 2023
Fri Oct 03 12:52:47 EDT 2025
Sun Oct 26 04:09:14 EDT 2025
Tue Sep 30 16:49:35 EDT 2025
Fri Sep 05 11:35:00 EDT 2025
Wed Oct 29 18:33:02 EDT 2025
Mon Oct 20 22:15:18 EDT 2025
Mon Oct 20 16:47:42 EDT 2025
Thu Oct 16 15:00:00 EDT 2025
Thu Oct 16 15:19:30 EDT 2025
Thu May 22 21:09:56 EDT 2025
Mon Jul 21 06:06:21 EDT 2025
Thu Apr 24 22:53:46 EDT 2025
Wed Oct 01 06:39:06 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
cc-by
Creative Commons Attribution License
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c691t-d584d64e06b1636f906aeb21e977b1506e22686b3b798aab8f46b2d8362bde2e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Conceived and designed the experiments: ZW. Performed the experiments: ZW. Analyzed the data: ZW ADS ART. Wrote the paper: ADS. Reviewed and contributed to the manuscript: ZW ART SD JST HH. Obtained anonymised free text from GPRD for testing the algorithm: SD ART. Study supervision: JST HH.
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.1371/journal.pone.0030412
PMID 22276193
PQID 1323078583
PQPubID 1436336
PageCount e30412
ParticipantIDs plos_journals_1323078583
doaj_primary_oai_doaj_org_article_d17e9e6b6da44b479e895c355a40dd42
unpaywall_primary_10_1371_journal_pone_0030412
pubmedcentral_primary_oai_pubmedcentral_nih_gov_3261909
proquest_miscellaneous_918037056
proquest_journals_1323078583
gale_infotracmisc_A477066162
gale_infotracacademiconefile_A477066162
gale_incontextgauss_ISR_A477066162
gale_incontextgauss_IOV_A477066162
gale_healthsolutions_A477066162
pubmed_primary_22276193
crossref_citationtrail_10_1371_journal_pone_0030412
crossref_primary_10_1371_journal_pone_0030412
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2012-01-19
PublicationDateYYYYMMDD 2012-01-19
PublicationDate_xml – month: 01
  year: 2012
  text: 2012-01-19
  day: 19
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: San Francisco
– name: San Francisco, USA
PublicationTitle PloS one
PublicationTitleAlternate PLoS One
PublicationYear 2012
Publisher Public Library of Science
Public Library of Science (PLoS)
Publisher_xml – name: Public Library of Science
– name: Public Library of Science (PLoS)
References S DeLisle (ref4) 2010; 5
S Schulz (ref23) 2002
S Pakhomov (ref2) 2007; 153
S Pakhomov (ref3) 2005; 38
Y Li (ref10) 2010
H Suominen (ref13) 2008
AR Tate (ref22) 2009; 9
F Ginter (ref9) 2009; 78
(ref18) 2011; 22
K Crammer (ref12) 2007
S Pakhomov (ref8) 2008
AR Aronson (ref11) 2007
R Rosales (ref15) 2010
E Herrett (ref17) 2010; 69
C Friedman (ref5) 2004; 11
V Sindhwani (ref20) 2006
BCM Fung (ref21) 2003
M Marchand (ref19) 2002; 3
AR Tate (ref1) 2011; 1
C Clark (ref7) 2008; 15
R Rosales (ref14) 2007
GK Savova (ref6) 2008; 15
(ref16) 2011; 22
References_xml – start-page: 59
  year: 2003
  ident: ref21
  article-title: Hierarchical document clustering using frequent itemsets.
  publication-title: In: Proceedings of SIAM International Conference on Data Mining
– volume: 38
  start-page: 145
  year: 2005
  ident: ref3
  article-title: Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier.
  publication-title: J Biomed Inform
  doi: 10.1016/j.jbi.2004.11.016
– year: 2008
  ident: ref13
  article-title: Machine learning to automate the assignment of diagnosis codes to free-text radiology reports: a method description.
  publication-title: In: Proceedings of the ICML/UAI/COLT Workshop on Machine Learning for Health-Care Applications
– volume: 3
  start-page: 723
  year: 2002
  ident: ref19
  article-title: The set covering machine.
  publication-title: J Mach Learn Res
– start-page: 129
  year: 2007
  ident: ref12
  article-title: Automatic code assignment to medical text.
  publication-title: In: Proceedings of the Workshop on Biological, Translational, and Clinical Language Processing
– volume: 78
  start-page: 1
  year: 2009
  ident: ref9
  article-title: Combining hidden Markov models and latent semantic analysis for topic segmentation and labeling: Method and clinical application.
  publication-title: Int J Med Inform
  doi: 10.1016/j.ijmedinf.2009.02.003
– start-page: 477
  year: 2006
  ident: ref20
  article-title: Large scale semi-supervised linear SVMs.
  publication-title: In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
– volume: 9
  start-page: 42
  year: 2009
  ident: ref22
  article-title: Determining the date of diagnosis - is it a simple matter? The impact of different approaches to dating diagnosis on estimates of delayed care for ovarian cancer in UK primary care.
  publication-title: BMC Med Res Methodol
  doi: 10.1186/1471-2288-9-42
– start-page: 105
  year: 2007
  ident: ref11
  article-title: From indexing the biomedical literature to coding clinical text: experience with mti and machine learning approaches.
  publication-title: In: Proceedings of the Workshop on Biological, Translational, and Clinical Language Processing
– volume: 69
  start-page: 4
  year: 2010
  ident: ref17
  article-title: Validation and validity of diagnoses in the General Practice Research Database: a systematic review.
  publication-title: Br J Clin Pharmacol
  doi: 10.1111/j.1365-2125.2009.03537.x
– volume: 15
  start-page: 36
  year: 2008
  ident: ref7
  article-title: Identifying smokers with a medical extraction system.
  publication-title: J Am Med Inform Assoc
  doi: 10.1197/jamia.M2442
– volume: 22
  year: 2011
  ident: ref18
  article-title: The Read Codes.
– start-page: 682
  year: 2010
  ident: ref15
  article-title: Automated identification of medical concepts and assertions in medical text.
  publication-title: In: Proceedings of the American Medical Informatics Association Annual Symposium
– volume: 11
  start-page: 392
  year: 2004
  ident: ref5
  article-title: Automated encoding of clinical documents based on natural language processing.
  publication-title: J Am Med Inform Assoc
  doi: 10.1197/jamia.M1552
– start-page: 545
  year: 2008
  ident: ref8
  article-title: Automatic quality of life prediction using electronic medical records.
– volume: 15
  start-page: 25
  year: 2008
  ident: ref6
  article-title: Mayo clinic NLP system for patient smoking status identification.
  publication-title: J Am Med Inform Assoc
  doi: 10.1197/jamia.M2437
– volume: 1
  start-page: e000025
  year: 2011
  ident: ref1
  article-title: Using free text information to explore how and when GPs code a diagnosis of ovarian cancer: an observational study using primary care records of patients with ovarian cancer.
  publication-title: BMJ Open
  doi: 10.1136/bmjopen-2010-000025
– start-page: 744
  year: 2010
  ident: ref10
  article-title: Section classification in clinical notes using supervised hidden Markov model.
– start-page: 61
  year: 2002
  ident: ref23
  article-title: Biomedical text retrieval in languages with a complex morphology.
  publication-title: In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain
– start-page: 530
  year: 2007
  ident: ref14
  article-title: Semi-supervised active learning for modeling medical concepts from free text.
  publication-title: In: Proceedings of the Sixth International Conference on Machine Learning and Applications
– volume: 22
  year: 2011
  ident: ref16
  article-title: The General Practice Research Database.
– volume: 153
  start-page: 666
  year: 2007
  ident: ref2
  article-title: Epidemiology of angina pectoris: Role of natural language processing of the medical record.
  publication-title: Am Heart J
  doi: 10.1016/j.ahj.2006.12.022
– volume: 5
  start-page: e13377
  year: 2010
  ident: ref4
  article-title: Combining free text and structured electronic medical record entries to detect acute respiratory infections.
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0013377
SSID ssj0053866
Score 2.3780868
SecondaryResourceType review_article
Snippet Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to...
Background Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is...
BackgroundElectronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is...
Background Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is...
SourceID plos
doaj
unpaywall
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e30412
SubjectTerms Active learning
Algorithms
Angiography
Annotations
Artificial Intelligence
Biology
Cancer
Cancer research
Codes
Computer Science
Confidentiality
Data mining
Diagnostic systems
Electronic Health Records
Electronic medical records
Electronic records
Epidemiology
Evaluation
Family medicine
Female
Health care
Health informatics
Health services
Humans
International conferences
Language
Learning algorithms
Machine learning
Male
Matching
Mathematics
Medical diagnosis
Medical records
Medical research
Medicine
Microprocessors
Morphology
Natural language processing
Ovarian cancer
Ovarian carcinoma
Ovarian Neoplasms - diagnosis
Patients
Primary care
Public health
R&D
Recall
Research & development
Semi-supervised learning
Sensitivity analysis
Studies
Test sets
Texts
Training
Unstructured data
SummonAdditionalLinks – databaseName: Directory of Open Access Journals - DOAJ (NTUSG)
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9QwELbQXuCCKK8utGAhJOCQbbJ2_DgW1KogARJQ1Ftkx852pW12RTai_QP8bmZib9qISu2B6_pzpMzLM5uZz4S8llVpTFaxRDltoUBRIjE-F4lOWcXBn_KK43Dy5y_i6Jh_OslPrlz1hT1hgR44CG7PZdJrL6xwhnPLpfZK5yWckoanzvEu-qZKb4qpEIPBi4WIg3JMZntRL5PVsvaT7mtgNh0cRB1ffx-VR6vFsrku5fy3c_JuW6_MxW-zWFw5lg4fkPsxn6T74T22yB1fPyRb0WMb-jbSSr97RP4cnK-7kah6Rl1osAOAqR2dX3JtLGsK9Xe7WDcUB09oGwlm21_eUWwSATC9vDqHhjFKGv7qaai9oI0_mydNu8IY1MCms65b09N4PcXsMTk-PPjx4SiJtzAkpdDZOnGQojjBfSos5G6i0qkwUI5nHjJHi_yEHjI4JSyzUitjrKq4sFOn4GS0zk89e0JGNch9m1BTVlKUlXfclzw10kA5JEzOyqmExzMzJmyjkqKMFOV4U8ai6L67SShVglQLVGQRFTkmSb9rFSg6bsC_R233WCTY7n4Asyui2RU3md2YvERbKYKY-zBR7HMpIYvLBCBedQgk2ahRQTPTNk3x8evPW4C-fxuA3kRQtURDMXFyAt4JybsGyJ0BEkJFOVjeRsveSKUBGeEcgMoVg50ba79-mfbL-FDszKv9sm0KnamUSUijx-Rp8I1esDhmDfU5bJYDrxlIfrhSz087inOGhX2qx2TS-9etdPvsf-j2ObkHaTF2LCWZ3iEj8DO_C6nn2r7oosxf2DSG5w
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELdG9wASAlY-VhhgISTgIV3SOHbygNAGnQbSCmq3aW-RHTulUknC0gr2D_B3c5c4YRET7LU-u4rvw3f23e8IeSnSREov9Z1QRwoClJA70gTciVw_ZaBPQcqwOPlowg9P2Kez4GyDTJpaGEyrbGxiZah1nuAd-S5ETSCOYRD674rvDnaNwtfVpoWGtK0V9NsKYuwG2RwhMlaPbO6PJ1-mjW0G7ebcFtD5wtu1_BoWeWaG1SuhN-ocUBWOf2ute8UyL69yRf_OqLy5zgp58UMul5eOq4N75I71M-leLRhbZMNkfXK36eFArUr3ye363o7W5Uh9smVHSvraIlK_uU9-jX-uqmqqbE4_1Ll5QCAzTS_BdOQZnZpyvVyVFGtW6InFpl2fG02P4Qygi4yO26479h9pHf-WVF3Qmfm2cGbrAs1XCZOOqkRPQy0G7PwBOTkYH78_dGwDByfhkbdyNHg3mjPjcgVuH08jl0uI5D0DTqdCaEMDzl_Ila9EFEqpwpRxNdIhHKpKm5HxH5JeBqzZJlQmqeBJajQzCXOlkBBJcRn4yUjA8r4cEL_hWpxYdHNssrGMqyc7AVFOvfEx8jq2vB4Qp51V1Oge_6HfR4FoaRGbu_ohP5_HVtVj7QkTGa64lowpJiITRkECfp1krtYMFnmO4hTXha6thYn3mBDgAHocKF5UFIjPkWEC0FyuyzL--Pn0GkSzaYfolSVKcxQUaYsu4JsQ96tDudOhBCuTdIa3UfibXSnjP_oIMxuFuHqYtsO4KCb1ZSZfl3Hkha4vwAMfkEe1-rQbixXaENrDZNFRrM7Od0eyxdcKHd3HOwE3GpBhq4LX4u3jf3_GE3ILfGVMY3K8aIf0QIPMU_BHV-qZNTK_AUTAjnU
  priority: 102
  providerName: ProQuest
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELbK9gAXoLy6UMBCiMchaRI7dnJcYKuC1ILaLioHFNmJs6xYsiuSCMqBI7-bmcQbNVBEOXBbxWNrM54Zf45nPhPyUOapUn7OnCiLNWxQIuEoEwon9ljOwZ_CnGNx8t6-2J3wV8fh8Rp5v6qFsRqEPeJ8UTYn-fhjUZhtq8lt5CtqT09dn0l_1cNdgpDbnPT5waOGcQi_jFVYgHSBrIsQoPqArE_234zetSfNgSMCj9lyuj-N1FuuGlb_LnYP8J-dBUx_z6-8WBdLdfJFzeenFq-dK-T76rXbnJWPbl1pN_32CyPkf9PLVXLZwl46akfZIGumuEY2bGAp6RPLfv30Ovkx_lo1lVvFlL5o8wBBQBUZPUUJsijogSnreVVSrI-hE8uDW382GT2C9YbOCjrubvihbbEVbffaJdUn9NB8mjmH9RJDZQmd9pqkUkMt3-z0BpnsjI-e7zr2sggnFbFfORkgqUxw4wkNEFPksSeU0YFvAOBqpFE0ADQjoZmWcaSUjnIudJBFsIDrzASG3SSDAlS1SahKcynS3GTcpNxTUsGuTaiQpYGE4ZkaErayiSS1TOp4occ8aY4HJeyoWq0mqPvE6n5InK7XsmUS-Yv8MzS3ThZ5wJsHMPmJnfQk86WJjdAiU5xrLmMTxWEKGFJxL8s4DHIfjTVpi2q7aJaMuJQANn0BEg8aCeQCKTDZaKrqskxevn57DqHDg57QYyuUL9BQlC3wgHdC2-xJbvUkIaKlveZNNO6VVkrQEZYrRGHEoOfK3c5upl0zDooJhIVZ1GUS-5HHJKD9IbnVOmenWKwGFxAPhkT23Lan-X5LMfvQMLEz_P7gxUPidg5-rrm9_a8d7pBLgNQxicrx4y0yAJ8ydwENV_qejWk_ASOFvEI
  priority: 102
  providerName: Unpaywall
Title Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning
URI https://www.ncbi.nlm.nih.gov/pubmed/22276193
https://www.proquest.com/docview/1323078583
https://www.proquest.com/docview/918037056
https://pubmed.ncbi.nlm.nih.gov/PMC3261909
https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0030412&type=printable
https://doaj.org/article/d17e9e6b6da44b479e895c355a40dd42
http://dx.doi.org/10.1371/journal.pone.0030412
UnpaywallVersion publishedVersion
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVFSB
  databaseName: Free Full-Text Journals in Chemistry (Open Access)
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: HH5
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://abc-chemistry.org/
  providerName: ABC ChemistRy
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: KQ8
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: KQ8
  dateStart: 20061001
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: DOA
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: ABDBF
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: EBSCOhost Food Science Source
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: A8Z
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/login.aspx?authtype=ip,uid&profile=ehost&defaultdb=fsr
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: DIK
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: GX1
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M~E
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: RPM
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 7X7
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: BENPR
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Technology Collection
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 8FG
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/technologycollection1
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Public Health Database
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: 8C1
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/publichealth
  providerName: ProQuest
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1932-6203
  dateEnd: 20250930
  omitProxy: true
  ssIdentifier: ssj0053866
  issn: 1932-6203
  databaseCode: M48
  dateStart: 20061201
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1fb9MwELdG9wAviPFvhVEshAQ8pEoax04eEOpGy0BqmdoVlafISZxSKSSlacT6Bfjc3CVutIii7SUP8dlS7nznu_jud4S8FnEopRXbhht5AQQoLjekcrjhmXbMQJ-cmGFx8mjMz2fsy9yZH5Bdz1bNwHxvaIf9pGbrpHv1a_sBFP592bVBWLtJ3VWWqm5514dthw_hrPKwmcOI1fcKoN3l7SV6LQbvmbYupvvfKo3DqsT0ry13a5Vk-T639N_syrtFupLb3zJJrh1dwwfkvvY5ab_aJEfkQKUPyZHW6py-1dDT7x6RP4OrTVk2lS7oxyoJDwhkGtFreBxZSicqL5JNTrE4hc40CG2xVhG9BGNPlykd1O11aFXpRKtAN6fBlk7Vz6UxLVZop3KYNCozOhXVYK-Lx2Q2HFyenRu6U4MRcs_aGBG4MRFnyuQB-Hc89kwuIWS3FHiXAWIYKvDyXB7YgfBcKQM3ZjzoRS6cnkGkesp-Qlop8P2YUBnGgoexipgKmSmFhJCJS8cOewKWt2Wb2DuR-KGGMcduGolf3s0JCGcqrvooSF8Lsk2MetaqgvG4gf4UpV3TIgh3-SJbL3yt035kCeUpHvBIMhYw4SnXc0Jw4CQzo4jBIi9xr_hVRWttSvw-EwI8PYsDxauSAoE4Usz0Wcgiz_3PX7_dgmg6aRC90URxhhtF6uoK-CYE-GpQnjQowZyEjeFj3Nk7ruTAI6wVcB3Xhpm73b5_mNbDuChm76UqK3Lfs1zTFuBqt8nTSjdqxmIpNsTwMFk0tKbB-eZIuvxRwqDbGPybXpt0a_26lWyf3SiW5-Qe-MWYsmRY3glpgRKpF-B7boIOuSPmAp7umYXP4acOOTwdjC8mnfJvTqc0N_BuNr7of_8L5aeMQg
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1fb9MwELdGeRgSAlb-rDCYhUDAQ7qkcezkAaHBOrVsHdLaor4FJ3ZKpZKEpdHoF-Dj8Bk5J05YxAR72Wt9dhXf3c939v1B6AWLQs6tyDZc4QXgoLjU4NKhhmfaEQF9ciKikpNHJ3QwJR9nzmwD_apyYVRYZYWJBVCLJFR35HvgNYE4uo5rv0u_G6prlHpdrVpolGJxJNfn4LJlb4cHwN-Xvd5hf_JhYOiuAkZIPWtlCDhyBSXSpAHYIjTyTMrBvbQkWEKBqrcnwSJxaWAHzHM5D9yI0KAnXED6QMietGHdG-gmsQFLQH_YrHbwADso1el5NrP2tDR00ySW3eIN0uo1jr-iS0B9FrTSZZJdZuj-Ha-5mccpX5_z5fLCYXh4D93RVizeL8VuC23IuI3uVh0isAaMNrpd3griMtmpjbb0SIZf63rXb-6jn_0fqyJXK57jgzLyDwh4LPCFIiBJjE9lli9XGVYZMXiqK9_mZ1LgCZwweBHjft3TR_8jLr3rDAdrPJbfFsY4TxU4ZjBpVISRSqwrzM4foOm1MPIhasXAmm2EeRgxGkZSEBkSkzMOfhrljh32GCxv8w6yK675oa6drlp4LP3iQZCBD1VuvK947Wted5BRz0rL2iH_oX-vBKKmVZW_ix-Ss7mvgcQXFpOepAEVnJCAME-6nhOC1ciJKQSBRXaVOPllGm2NX_4-YQzMS4sCxfOCQlX_iFV40ZznWeYPP32-AtH4tEH0ShNFiRIUrlM64JtUVbEG5U6DEjAsbAxvK-GvdiXz_2g7zKwU4vJhXA-rRVXIYCyTPPM9yzVtBvZ9Bz0q1afeWJX_TcFz6SDWUKzGzjdH4sXXova6rW4cTK-DurUKXom3j__9GbtoczAZHfvHw5OjJ-gWWOUqYMqwvB3UAm2ST8HyXQXPCrjB6Mt149tviAPERg
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1fb9MwELdGkQAJASt_VhjMQiDgIW1SO3bygNCgrVbGBlpX1LfgxE6pVJKytBr9AnwoPh3nxAmLmGAve63PruK7-_nOvj8IPeNxJIQTE8uTfggOiscsoVxm-TaJKeiTG1OdnHxwyPbG9P3EnWygX2UujA6rLDExB2qZRvqOvANeE4ij53qkE5uwiE-9wZvFd0t3kNIvrWU7jUJE9tX6FNy37PWwB7x-3u0O-sfv9izTYcCKmO8sLQnHr2RU2SwEu4TFvs0EuJqOAqso1LX3FFgnHgtJyH1PiNCLKQu70gPUD6XqKgLrXkFXOSG-Difkk8rZAxxhzKTqEe50jGS0F2mi2vl7pNOtHYV5x4DqXGgs5ml2ntH7d-zm9VWyEOtTMZ-fORgHd9AtY9Hi3UIEN9GGSprodtktAhvwaKKbxQ0hLhKfmmjTjGT4pal9_eou-tn_sczztpIp7hVRgEAgEonPFARJE3ykstV8mWGdHYPHpgru6kRJfAwsw7ME96v-PuYfceFpZzhc45H6NrNGq4UGygwmHeQhpQqbarPTe2h8KYy8jxoJsGYLYRHFnEWxklRF1BZcgM_GhEuiLofliWghUnItiEwddd3OYx7kj4Mc_Kli4wPN68DwuoWsataiqCPyH_q3WiAqWl0FPP8hPZkGBlQC6XDlKxYyKSgNKfeV57sRWJCC2lJSWGRHi1NQpNRWWBbsUs7B1HQYUDzNKXQlkETr1FSssiwYfvx8AaLRUY3ohSGKUy0owqR3wDfpCmM1yu0aJeBZVBve0sJf7koW_NF8mFkqxPnDuBrWi-rwwUSlqyzwHc8mHGz9FnpQqE-1sToXnIEX00K8pli1na-PJLOveR12om8fbL-F2pUKXoi3D__9GTvoGiBb8GF4uP8I3QADXcdOWY6_jRqgTOoxGMHL8EmONhh9uWx4-w0BAMiJ
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELbK9gAXoLy6UMBCiMchaRI7dnJcYKuC1ILaLioHFNmJs6xYsiuSCMqBI7-bmcQbNVBEOXBbxWNrM54Zf45nPhPyUOapUn7OnCiLNWxQIuEoEwon9ljOwZ_CnGNx8t6-2J3wV8fh8Rp5v6qFsRqEPeJ8UTYn-fhjUZhtq8lt5CtqT09dn0l_1cNdgpDbnPT5waOGcQi_jFVYgHSBrIsQoPqArE_234zetSfNgSMCj9lyuj-N1FuuGlb_LnYP8J-dBUx_z6-8WBdLdfJFzeenFq-dK-T76rXbnJWPbl1pN_32CyPkf9PLVXLZwl46akfZIGumuEY2bGAp6RPLfv30Ovkx_lo1lVvFlL5o8wBBQBUZPUUJsijogSnreVVSrI-hE8uDW382GT2C9YbOCjrubvihbbEVbffaJdUn9NB8mjmH9RJDZQmd9pqkUkMt3-z0BpnsjI-e7zr2sggnFbFfORkgqUxw4wkNEFPksSeU0YFvAOBqpFE0ADQjoZmWcaSUjnIudJBFsIDrzASG3SSDAlS1SahKcynS3GTcpNxTUsGuTaiQpYGE4ZkaErayiSS1TOp4occ8aY4HJeyoWq0mqPvE6n5InK7XsmUS-Yv8MzS3ThZ5wJsHMPmJnfQk86WJjdAiU5xrLmMTxWEKGFJxL8s4DHIfjTVpi2q7aJaMuJQANn0BEg8aCeQCKTDZaKrqskxevn57DqHDg57QYyuUL9BQlC3wgHdC2-xJbvUkIaKlveZNNO6VVkrQEZYrRGHEoOfK3c5upl0zDooJhIVZ1GUS-5HHJKD9IbnVOmenWKwGFxAPhkT23Lan-X5LMfvQMLEz_P7gxUPidg5-rrm9_a8d7pBLgNQxicrx4y0yAJ8ydwENV_qejWk_ASOFvEI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+Diagnoses+and+Investigation+Results+from+Unstructured+Text+in+Electronic+Health+Records+by+Semi-Supervised+Machine+Learning&rft.jtitle=PloS+one&rft.au=Shawe-Taylor%2C+John&rft.au=Wang%2C+Zhuoran&rft.au=Hemingway%2C+Harry&rft.au=Shah%2C+Anoop+D&rft.date=2012-01-19&rft.pub=Public+Library+of+Science&rft.issn=1932-6203&rft.eissn=1932-6203&rft.volume=7&rft.issue=1&rft.spage=e30412&rft_id=info:doi/10.1371%2Fjournal.pone.0030412&rft.externalDBID=n%2Fa&rft.externalDocID=A477066162
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6203&client=summon