High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challe...

Full description

Saved in:
Bibliographic Details
Published inNature protocols Vol. 14; no. 12; pp. 3426 - 3444
Main Authors Zhang, Yichi, Cai, Tianrun, Yu, Sheng, Cho, Kelly, Hong, Chuan, Sun, Jiehuan, Huang, Jie, Ho, Yuk-Lam, Ananthakrishnan, Ashwin N., Xia, Zongqi, Shaw, Stanley Y., Gainer, Vivian, Castro, Victor, Link, Nicholas, Honerlaw, Jacqueline, Huang, Sicong, Gagnon, David, Karlson, Elizabeth W., Plenge, Robert M., Szolovits, Peter, Savova, Guergana, Churchill, Susanne, O’Donnell, Christopher, Murphy, Shawn N., Gaziano, J. Michael, Kohane, Isaac, Cai, Tianxi, Liao, Katherine P.
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 01.12.2019
Nature Publishing Group
Subjects
Online AccessGet full text
ISSN1754-2189
1750-2799
1750-2799
DOI10.1038/s41596-019-0227-6

Cover

Abstract Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no). PheCAP takes structured data and narrative notes from electronic medical records and enables patients with a particular clinical phenotype to be identified.
AbstractList Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping using EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures reducing the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 days if all data are available; however, the timing is largely dependent on the chart review stage which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no). PheCAP takes structured data and narrative notes from electronic medical records and enables patients with a particular clinical phenotype to be identified.
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1–2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no). PheCAP takes structured data and narrative notes from electronic medical records and enables patients with a particular clinical phenotype to be identified.
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Audience Academic
Author Huang, Sicong
Zhang, Yichi
Xia, Zongqi
Gainer, Vivian
Liao, Katherine P.
Castro, Victor
Plenge, Robert M.
Churchill, Susanne
Link, Nicholas
Sun, Jiehuan
Hong, Chuan
Szolovits, Peter
Cai, Tianxi
Karlson, Elizabeth W.
Kohane, Isaac
Gagnon, David
Ho, Yuk-Lam
Savova, Guergana
Shaw, Stanley Y.
Cho, Kelly
Huang, Jie
Ananthakrishnan, Ashwin N.
Gaziano, J. Michael
Murphy, Shawn N.
O’Donnell, Christopher
Yu, Sheng
Cai, Tianrun
Honerlaw, Jacqueline
AuthorAffiliation 4 Department of Industrial Engineering, Tsinghua University, Beijing, China
8 Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, MA
5 Division of Data Sciences, VA Boston Healthcare System, Boston, MA
7 Department of Neurology, University of Pittsburgh, Pittsburgh, PA
11 Inflammation & Immunology Thematic Center of Excellence (TCoE) Unit, Celgene, Cambridge, MA (contribution to study prior to current affiliation)
12 Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA
14 Department of Biomedical Informatics, Harvard Medical School, Boston, MA
15 Division of Cardiology, VA Boston Healthcare System, Boston, MA
9 Research Information Science and Computing, Partners Healthcare, Boston, MA
3 Center for Statistical Science, Tsinghua University, Beijing, China
6 Department of Gastroenterology, Massachusetts General Hospital, Boston, MA
13 Computational Health Informatics Program, Children’s Hospital, Boston, MA
16 Department of Neurology, Massachusetts Gen
AuthorAffiliation_xml – name: 10 Department of Biostatistics, Boston University, Boston, MA, USA
– name: 14 Department of Biomedical Informatics, Harvard Medical School, Boston, MA
– name: 2 Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA USA
– name: 8 Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, MA
– name: 16 Department of Neurology, Massachusetts General Hospital, Boston, MA
– name: 11 Inflammation & Immunology Thematic Center of Excellence (TCoE) Unit, Celgene, Cambridge, MA (contribution to study prior to current affiliation)
– name: 13 Computational Health Informatics Program, Children’s Hospital, Boston, MA
– name: 15 Division of Cardiology, VA Boston Healthcare System, Boston, MA
– name: 4 Department of Industrial Engineering, Tsinghua University, Beijing, China
– name: 7 Department of Neurology, University of Pittsburgh, Pittsburgh, PA
– name: 5 Division of Data Sciences, VA Boston Healthcare System, Boston, MA
– name: 9 Research Information Science and Computing, Partners Healthcare, Boston, MA
– name: 1 Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
– name: 17 Division of Aging, Brigham and Women’s Hospital, Boston, MA
– name: 12 Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA
– name: 3 Center for Statistical Science, Tsinghua University, Beijing, China
– name: 6 Department of Gastroenterology, Massachusetts General Hospital, Boston, MA
Author_xml – sequence: 1
  givenname: Yichi
  surname: Zhang
  fullname: Zhang, Yichi
  organization: Department of Biostatistics, Harvard T.H. Chan School of Public Health
– sequence: 2
  givenname: Tianrun
  surname: Cai
  fullname: Cai, Tianrun
  organization: Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital
– sequence: 3
  givenname: Sheng
  surname: Yu
  fullname: Yu, Sheng
  organization: Center for Statistical Science, Tsinghua University, Department of Industrial Engineering, Tsinghua University
– sequence: 4
  givenname: Kelly
  surname: Cho
  fullname: Cho, Kelly
  organization: Division of Data Sciences, VA Boston Healthcare System, Division of Aging, Brigham and Women’s Hospital
– sequence: 5
  givenname: Chuan
  surname: Hong
  fullname: Hong, Chuan
  organization: Department of Biostatistics, Harvard T.H. Chan School of Public Health
– sequence: 6
  givenname: Jiehuan
  surname: Sun
  fullname: Sun, Jiehuan
  organization: Department of Biostatistics, Harvard T.H. Chan School of Public Health
– sequence: 7
  givenname: Jie
  surname: Huang
  fullname: Huang, Jie
  organization: Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital
– sequence: 8
  givenname: Yuk-Lam
  orcidid: 0000-0003-3305-3830
  surname: Ho
  fullname: Ho, Yuk-Lam
  organization: Division of Data Sciences, VA Boston Healthcare System
– sequence: 9
  givenname: Ashwin N.
  surname: Ananthakrishnan
  fullname: Ananthakrishnan, Ashwin N.
  organization: Department of Gastroenterology, Massachusetts General Hospital
– sequence: 10
  givenname: Zongqi
  orcidid: 0000-0003-1500-2589
  surname: Xia
  fullname: Xia, Zongqi
  organization: Department of Neurology, University of Pittsburgh
– sequence: 11
  givenname: Stanley Y.
  surname: Shaw
  fullname: Shaw, Stanley Y.
  organization: Division of Cardiovascular Medicine, Brigham and Women’s Hospital
– sequence: 12
  givenname: Vivian
  surname: Gainer
  fullname: Gainer, Vivian
  organization: Research Information Science and Computing, Partners Healthcare
– sequence: 13
  givenname: Victor
  surname: Castro
  fullname: Castro, Victor
  organization: Research Information Science and Computing, Partners Healthcare
– sequence: 14
  givenname: Nicholas
  surname: Link
  fullname: Link, Nicholas
  organization: Division of Data Sciences, VA Boston Healthcare System
– sequence: 15
  givenname: Jacqueline
  surname: Honerlaw
  fullname: Honerlaw, Jacqueline
  organization: Division of Data Sciences, VA Boston Healthcare System
– sequence: 16
  givenname: Sicong
  surname: Huang
  fullname: Huang, Sicong
  organization: Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital
– sequence: 17
  givenname: David
  surname: Gagnon
  fullname: Gagnon, David
  organization: Division of Data Sciences, VA Boston Healthcare System, Department of Biostatistics, Boston University
– sequence: 18
  givenname: Elizabeth W.
  surname: Karlson
  fullname: Karlson, Elizabeth W.
  organization: Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital
– sequence: 19
  givenname: Robert M.
  surname: Plenge
  fullname: Plenge, Robert M.
  organization: Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital
– sequence: 20
  givenname: Peter
  surname: Szolovits
  fullname: Szolovits, Peter
  organization: Department of Electrical Engineering and Computer Science, MIT
– sequence: 21
  givenname: Guergana
  surname: Savova
  fullname: Savova, Guergana
  organization: Computational Health Informatics Program, Boston Children’s Hospital
– sequence: 22
  givenname: Susanne
  surname: Churchill
  fullname: Churchill, Susanne
  organization: Department of Biomedical Informatics, Harvard Medical School
– sequence: 23
  givenname: Christopher
  surname: O’Donnell
  fullname: O’Donnell, Christopher
  organization: Division of Data Sciences, VA Boston Healthcare System, Division of Cardiology, VA Boston Healthcare System
– sequence: 24
  givenname: Shawn N.
  surname: Murphy
  fullname: Murphy, Shawn N.
  organization: Research Information Science and Computing, Partners Healthcare, Department of Biomedical Informatics, Harvard Medical School, Department of Neurology, Massachusetts General Hospital
– sequence: 25
  givenname: J. Michael
  surname: Gaziano
  fullname: Gaziano, J. Michael
  organization: Division of Data Sciences, VA Boston Healthcare System, Division of Aging, Brigham and Women’s Hospital
– sequence: 26
  givenname: Isaac
  surname: Kohane
  fullname: Kohane, Isaac
  organization: Department of Biomedical Informatics, Harvard Medical School
– sequence: 27
  givenname: Tianxi
  surname: Cai
  fullname: Cai, Tianxi
  organization: Department of Biostatistics, Harvard T.H. Chan School of Public Health, Department of Biomedical Informatics, Harvard Medical School
– sequence: 28
  givenname: Katherine P.
  orcidid: 0000-0002-4797-3200
  surname: Liao
  fullname: Liao, Katherine P.
  email: kliao@bwh.harvard.edu
  organization: Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Division of Data Sciences, VA Boston Healthcare System, Department of Biomedical Informatics, Harvard Medical School
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31748751$$D View this record in MEDLINE/PubMed
BookMark eNqNkk1v1DAQhiNURNuFH8AFReLSCqX4I4njC9JqBbRSBRUUcbQcx05cJXawnZb99zjswrIVRcgHW_bzznjemePkwFgjk-Q5BGcQ4Oq1z2FBywxAmgGESFY-So4gKUCGCKUHP895hmBFD5Nj728AyAkuyZPkEEOSV6SAR8l4rtsuC52zU9uNU0jHThob1qM2bXqnQ5fKXorgrNEiHWSjBe9TJ4V1TdrwwNPJzyRPhR0Ga1IvB535aZTuVnvZpHwcneWiS0-uOrlaXp0-TR4r3nv5bLsvki_v3l6vzrPLj-8vVsvLTJSAhEwhmCNcFqLGqiqBqGiuCK8kRjVsKMI1USXmNZKoEapuBKUKIKWUKKoCYFHgRYI2cScz8vUd73s2Oj1wt2YQsNk-trGPRfvYbB8ro-jNRjROdSxWSBMc3wkt12z_xeiOtfaWEYxw_GIMcLIN4Oy3SfrABu2F7HtupJ08QxiWpMJFrGCRvLyH3tjJmejJTFFIAQVkR7W8l0wbZWNeMQdlyxKUtARVTiN19hcqria2Q8ShUTre7wlO9wSRCfJ7aPnkPbv4_GmfffUwu7z-uvqwT7_408Lf3v0auQjADSCc9d5J9V99Ifc0QgcetJ27oPt_Krdj4GMW00q3c_lh0Q-CFAhu
CitedBy_id crossref_primary_10_1093_jamia_ocad226
crossref_primary_10_3389_fnins_2022_884708
crossref_primary_10_1016_j_hlpt_2022_100638
crossref_primary_10_1002_acr_24132
crossref_primary_10_1001_jamanetworkopen_2021_14723
crossref_primary_10_1038_s41467_024_48568_8
crossref_primary_10_1002_acr2_11289
crossref_primary_10_1093_bib_bbad228
crossref_primary_10_2196_40384
crossref_primary_10_1002_acr_24804
crossref_primary_10_1093_jamia_ocae121
crossref_primary_10_1093_jamia_ocae005
crossref_primary_10_3390_diagnostics10110972
crossref_primary_10_1186_s12874_024_02443_8
crossref_primary_10_1093_jamia_ocaa343
crossref_primary_10_1038_s41598_021_03204_z
crossref_primary_10_1080_1744666X_2024_2359019
crossref_primary_10_1210_jendso_bvad123
crossref_primary_10_1016_j_msard_2021_103333
crossref_primary_10_1001_jamanetworkopen_2020_8236
crossref_primary_10_1016_j_ijmedinf_2022_104753
crossref_primary_10_1016_j_cll_2022_09_023
crossref_primary_10_1111_cts_13871
crossref_primary_10_1038_s41467_025_55879_x
crossref_primary_10_3389_fdgth_2023_1150687
crossref_primary_10_1016_j_jaip_2022_04_016
crossref_primary_10_1038_s41746_024_01331_1
crossref_primary_10_1038_s41746_024_01120_w
crossref_primary_10_1161_CIRCOUTCOMES_120_006528
crossref_primary_10_1001_jamacardio_2023_0857
crossref_primary_10_1093_jamiaopen_ooae134
crossref_primary_10_1038_s41746_021_00519_z
crossref_primary_10_1093_aje_kwac182
crossref_primary_10_1093_jamiaopen_ooab028
crossref_primary_10_1002_acn3_51324
crossref_primary_10_1001_jamanetworkopen_2024_3062
crossref_primary_10_1093_jamia_ocac234
crossref_primary_10_1183_13993003_04644_2020
crossref_primary_10_1016_j_jbi_2023_104335
crossref_primary_10_1093_jamia_ocab264
crossref_primary_10_3389_fncom_2023_1192876
crossref_primary_10_1016_j_jid_2024_08_025
crossref_primary_10_1186_s13073_022_01074_2
crossref_primary_10_1161_CIRCRESAHA_120_316401
crossref_primary_10_1007_s11936_023_01032_0
crossref_primary_10_1016_j_heliyon_2024_e26434
crossref_primary_10_1016_j_jbi_2022_104175
crossref_primary_10_1080_03772063_2024_2304002
crossref_primary_10_1093_jamia_ocae062
crossref_primary_10_1186_s40164_022_00333_7
crossref_primary_10_1038_s41598_021_99481_9
crossref_primary_10_2196_22219
crossref_primary_10_3390_nu14051121
crossref_primary_10_1016_j_tjnut_2023_12_051
crossref_primary_10_1038_s41598_021_86361_5
crossref_primary_10_1002_wics_1549
crossref_primary_10_1038_s41591_023_02274_y
crossref_primary_10_1093_rheumatology_keaa198
crossref_primary_10_1093_jamiaopen_ooab117
crossref_primary_10_1109_ACCESS_2024_3457850
crossref_primary_10_1007_s10985_022_09557_5
crossref_primary_10_1016_j_jbi_2023_104425
crossref_primary_10_1109_ACCESS_2023_3325896
crossref_primary_10_1016_j_jbi_2024_104685
crossref_primary_10_1016_j_trecan_2023_10_006
crossref_primary_10_1093_jamia_ocac063
crossref_primary_10_1016_j_compbiomed_2024_108577
crossref_primary_10_1055_a_1938_0436
crossref_primary_10_1093_aje_kwae226
crossref_primary_10_1016_j_yamp_2020_07_013
crossref_primary_10_1055_s_0040_1702007
crossref_primary_10_1093_ajh_hpad081
crossref_primary_10_1001_jamanetworkopen_2021_34627
crossref_primary_10_1093_jamia_ocae072
crossref_primary_10_1016_j_engappai_2024_109972
crossref_primary_10_1038_s41588_024_01793_9
crossref_primary_10_1038_s41598_022_19244_y
crossref_primary_10_1109_ACCESS_2024_3467251
crossref_primary_10_1136_bmjnph_2021_000401
crossref_primary_10_2196_45662
crossref_primary_10_7475_kjan_2022_34_4_351
crossref_primary_10_1016_j_jbi_2020_103626
crossref_primary_10_1093_jamia_ocac216
crossref_primary_10_1080_02699052_2024_2373920
crossref_primary_10_1097_CIN_0000000000001146
Cites_doi 10.1093/jamia/ocw011
10.1038/s41588-018-0248-z
10.1016/j.cgh.2013.10.011
10.1006/jbin.2001.1029
10.1093/ibd/izy127
10.1016/j.ajhg.2012.01.010
10.1097/MIB.0b013e31828133fd
10.1371/journal.pone.0078927
10.1212/WNL.0000000000003490
10.1136/jamia.2009.000893
10.1017/S0033291711000997
10.1016/j.ajhg.2010.12.007
10.1038/nbt.2749
10.1093/jamia/ocv034
10.3115/v1/P14-5010
10.1002/acr.20184
10.1002/art.37801
10.1097/MIB.0000000000000524
10.1136/annrheumdis-2012-203202
10.1111/j.1475-6773.2005.00444.x
10.1093/jamia/ocx111
10.1136/bmj.h1885
10.1080/14737159.2018.1439380
10.1038/nature12873
10.1136/amiajnl-2011-000583
10.1093/jamia/ocw028
10.1002/art.39851
10.1176/appi.ajp.2014.14030423
10.2337/dc09-1506
10.1186/s12958-015-0115-z
10.1016/j.jbi.2014.06.007
10.1007/s00439-014-1466-9
10.1016/j.jclinepi.2015.09.016
10.1016/j.jpeds.2017.05.037
10.1016/j.semarthrit.2019.01.002
10.1016/j.ajhg.2018.05.010
10.1055/s-0038-1634945
10.1136/jamia.2009.001560
10.1093/jamia/ocv202
10.1093/jamia/ocw135
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Nature Limited 2019
COPYRIGHT 2019 Nature Publishing Group
Copyright Nature Publishing Group Dec 2019
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Nature Limited 2019
– notice: COPYRIGHT 2019 Nature Publishing Group
– notice: Copyright Nature Publishing Group Dec 2019
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
ATWCN
ISR
3V.
7QG
7T5
7T7
7TM
7X7
7XB
88E
8FD
8FE
8FH
8FI
8FJ
8FK
ABUWG
AEUYN
AFKRA
ATCPS
AZQEC
BBNVY
BENPR
BHPHI
C1K
CCPQU
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
H94
HCIFZ
K9.
LK8
M0S
M1P
M7N
M7P
P64
PATMY
PHGZM
PHGZT
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PYCSY
RC3
7X8
5PM
ADTOC
UNPAY
DOI 10.1038/s41596-019-0227-6
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Middle School
Gale In Context: Science
ProQuest Central (Corporate)
Animal Behavior Abstracts
Immunology Abstracts
Industrial and Applied Microbiology Abstracts (Microbiology A)
Nucleic Acids Abstracts
Health & Medical Collection (Proquest)
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Natural Science Journals
ProQuest Hospital Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest One Sustainability (subscription)
ProQuest Central UK/Ireland
Agricultural & Environmental Science Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
Environmental Sciences and Pollution Management
ProQuest One Community College
ProQuest Central
Engineering Research Database
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
AIDS and Cancer Research Abstracts
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Biological Sciences
Health & Medical Collection (Alumni Edition)
Medical Database
Algology Mycology and Protozoology Abstracts (Microbiology C)
Biological Science Database
Biotechnology and BioEngineering Abstracts
Environmental Science Database (subscripiton)
ProQuest Central Premium
ProQuest One Academic
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Environmental Science Collection
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
ProQuest Central Student
ProQuest Central Essentials
Nucleic Acids Abstracts
SciTech Premium Collection
ProQuest Central China
Environmental Sciences and Pollution Management
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
Industrial and Applied Microbiology Abstracts (Microbiology A)
ProQuest Central (New)
ProQuest Medical Library (Alumni)
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Environmental Science Collection
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Environmental Science Database
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
Technology Research Database
ProQuest One Academic Middle East (New)
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Genetics Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Algology Mycology and Protozoology Abstracts (Microbiology C)
Agricultural & Environmental Science Collection
AIDS and Cancer Research Abstracts
ProQuest SciTech Collection
ProQuest Medical Library
Animal Behavior Abstracts
Immunology Abstracts
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList

ProQuest Central Student


MEDLINE - Academic
MEDLINE


Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 4
  dbid: BENPR
  name: ProQuest Central
  url: http://www.proquest.com/pqcentral?accountid=15518
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1750-2799
EndPage 3444
ExternalDocumentID oai:dash.harvard.edu:1/42083016
PMC7323894
A606960849
31748751
10_1038_s41596_019_0227_6
Genre Research Support, U.S. Gov't, Non-P.H.S
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GeographicLocations United States
GeographicLocations_xml – name: United States
GrantInformation_xml – fundername: Pfizer (Pfizer Inc.)
  funderid: https://doi.org/10.13039/100004319
– fundername: U.S. Department of Health & Human Services | National Institutes of Health (NIH)
  grantid: P30 AR072577; U54 LM008748; P30 AR 072577; P30 AR 072577; P30 AR 072577; U54 LM008748; U54 LM008748; NINDS098023; U54 LM008748; U54 LM008748; R01 HG009174; R01 HG009174; U54 LM008748; T32 AR007530; U54 LM008748; U54 LM008748; U54 LM008748; U54 LM008748; U54 LM008748; R01 HG009174; U54 LM008748; U54 LM008748; U54 LM008748; P30 AR 072577
  funderid: https://doi.org/10.13039/100000002
– fundername: Harold and DuVal Bowen Fund
– fundername: Office of Research and Development (VHA Office of Research and Development)
  grantid: I01-CX001025; I01-CX001025; I01-CX001025; I01-CX001025; I01-CX001025; I01-CX001025
  funderid: https://doi.org/10.13039/100006379
– fundername: NIAMS NIH HHS
  grantid: P30 AR072577
– fundername: NHGRI NIH HHS
  grantid: R01 HG009174
– fundername: CSRD VA
  grantid: I01 CX001025
– fundername: NLM NIH HHS
  grantid: U54 LM008748
– fundername: NINDS NIH HHS
  grantid: R01 NS098023
– fundername: NIAMS NIH HHS
  grantid: T32 AR007530
GroupedDBID ---
0R~
123
29M
39C
3TQ
3V.
4.4
53G
5BI
5M7
70F
7X7
7XC
88E
8FE
8FH
8FI
8FJ
AAEEF
AARCD
AAWYQ
AAYZH
AAZLF
ABAWZ
ABJNI
ABLJU
ABUWG
ACGFO
ACGFS
ACMJI
ACPRK
ADBBV
ADFRT
AENEX
AEUYN
AFBBN
AFKRA
AFRAH
AFSHS
AGAYW
AHBCP
AHMBA
AHSBF
AIBTJ
ALFFA
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AMTXH
ARMCB
ASPBG
ATCPS
ATWCN
AVWKF
AXYYD
AZFZN
BBNVY
BENPR
BHPHI
BKKNO
BPHCQ
BVXVI
CAG
CCPQU
COF
DB5
DU5
EBS
EE.
EJD
EMOBN
F5P
FEDTE
FSGXE
FYUFA
FZEXT
HCIFZ
HMCUK
HVGLF
HZ~
IAO
IGS
IHR
INH
INR
ISR
ITC
LGEZI
LK8
LOTEE
M1P
M7P
NADUK
NNMJJ
NXXTH
O9-
ODYON
P2P
PATMY
PQQKQ
PROAC
PSQYO
PYCSY
RNT
RNTTT
SHXYY
SIXXV
SNYQT
SOJ
SV3
TAOOD
TBHMF
TDRGL
TSG
UKHRP
AAYXX
AFANA
ATHPR
CITATION
PUEGO
CGR
CUY
CVF
ECM
EIF
NFIDA
NPM
PHGZM
PHGZT
PJZUB
PPXIY
PQGLB
AGSTI
7QG
7T5
7T7
7TM
7XB
8FD
8FK
AZQEC
C1K
DWQXO
FR3
GNUQQ
H94
K9.
M7N
P64
PKEHL
PQEST
PQUKI
PRINS
RC3
7X8
5PM
ADTOC
UNPAY
ID FETCH-LOGICAL-c607t-f2142365cb3f860c894f7a8e32b1d923b7f63ab2e2dcfbdc99f02fffc58503c53
IEDL.DBID UNPAY
ISSN 1754-2189
1750-2799
IngestDate Sun Oct 26 03:37:42 EDT 2025
Tue Sep 30 16:58:49 EDT 2025
Thu Sep 04 16:50:10 EDT 2025
Tue Oct 07 06:03:59 EDT 2025
Mon Oct 20 22:03:55 EDT 2025
Mon Oct 20 16:49:29 EDT 2025
Thu Oct 16 14:37:52 EDT 2025
Thu Oct 16 14:21:28 EDT 2025
Mon Jul 21 04:15:24 EDT 2025
Thu Apr 24 22:51:38 EDT 2025
Wed Oct 01 00:19:59 EDT 2025
Fri Feb 21 02:37:37 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c607t-f2142365cb3f860c894f7a8e32b1d923b7f63ab2e2dcfbdc99f02fffc58503c53
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
AUTHOR CONTRIBUTIONS
YZ, TC1, SY, CH, JS, JH, ANA, ZX, SYS, VG, VC, NL, DWK, RMP, PS, GS, SC, SNM, IK, TC2, KPL contributed to the development of pipeline; YZ, TC1, SY, CH, JS, NL, TC2, contributed to the development of the R package and software development used in this protocol; YZ, TC, KC, CH, JS, JH, HL, ANA, ZX, SYS, VG, VC, NL, JH, SH, DG, PS, GS, SC, CO, SNM, JMG, IK, TC, KPL, contributed to the validation and enhancements to pipeline; YZ, TC1, SY, CH, JS, VG, VC, GS, TC2, KPL drafted the manuscript; all authors contributed to revisions and proofreading of the manuscript.
contributed equally to the work
ORCID 0000-0002-4797-3200
0000-0003-1500-2589
0000-0003-3305-3830
OpenAccessLink https://proxy.k.utb.cz/login?url=http://nrs.harvard.edu/urn-3:HUL.InstRepos:42083016
PMID 31748751
PQID 2319190907
PQPubID 536306
PageCount 19
ParticipantIDs unpaywall_primary_10_1038_s41596_019_0227_6
pubmedcentral_primary_oai_pubmedcentral_nih_gov_7323894
proquest_miscellaneous_2316783592
proquest_journals_2319190907
gale_infotracmisc_A606960849
gale_infotracacademiconefile_A606960849
gale_incontextgauss_ISR_A606960849
gale_incontextgauss_ATWCN_A606960849
pubmed_primary_31748751
crossref_primary_10_1038_s41596_019_0227_6
crossref_citationtrail_10_1038_s41596_019_0227_6
springer_journals_10_1038_s41596_019_0227_6
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-12-01
PublicationDateYYYYMMDD 2019-12-01
PublicationDate_xml – month: 12
  year: 2019
  text: 2019-12-01
  day: 01
PublicationDecade 2010
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationSubtitle Recipes for Researchers
PublicationTitle Nature protocols
PublicationTitleAbbrev Nat Protoc
PublicationTitleAlternate Nat Protoc
PublicationYear 2019
Publisher Nature Publishing Group UK
Nature Publishing Group
Publisher_xml – name: Nature Publishing Group UK
– name: Nature Publishing Group
References Gaziano (CR6) 2016; 70
Banda (CR7) 2017
Ananthakrishnan (CR20) 2013; 19
Lindberg (CR33) 1993; 32
CR39
Agarwal (CR31) 2016; 23
CR38
Perlis (CR45) 2012; 42
CR37
Carroll (CR21) 2012; 19
Goryachev (CR36) 2006; 2006
Xia (CR22) 2013; 8
Jorge (CR44) 2019; 49
Geva (CR47) 2017; 188
Castro (CR15) 2015; 172
Kurreeman (CR26) 2012; 90
Liao (CR12) 2015; 350
Brownstein (CR1) 2010; 33
Kurreeman (CR3) 2011; 88
Cai (CR24) 2018; 24
Okada (CR27) 2014; 506
Ananthakrishnan (CR23) 2014; 12
Kirby (CR9) 2016; 23
Yu (CR32) 2018; 25
Jupp, Burdett, Leroy, Parkinson (CR34) 2015; 1546
Chapman (CR41) 2001; 34
Yu (CR14) 2015; 22
Liao (CR25) 2014; 73
Canela-Xandri (CR5) 2018; 50
Ananthakrishnan (CR28) 2015; 21
Castro (CR42) 2017; 88
Liao (CR11) 2010; 62
Liao (CR4) 2013; 65
Halpern (CR30) 2016; 23
Castro (CR43) 2015; 13
Doss, Mo, Carroll, Crofford, Denny (CR46) 2017; 69
Savova (CR35) 2010; 17
CR40
Murphy (CR16) 2010; 17
Basile (CR19) 2018; 18
Rasmussen (CR18) 2014; 51
Kho (CR8) 2011; 3
O’Malley (CR10) 2005; 40
Sinnott (CR29) 2014; 133
Son (CR17) 2018; 103
Denny (CR2) 2013; 31
Yu (CR13) 2017; 24
V Agarwal (227_CR31) 2016; 23
J Doss (227_CR46) 2017; 69
F Kurreeman (227_CR3) 2011; 88
T Cai (227_CR24) 2018; 24
Z Xia (227_CR22) 2013; 8
RH Perlis (227_CR45) 2012; 42
JC Kirby (227_CR9) 2016; 23
VM Castro (227_CR15) 2015; 172
S Jupp (227_CR34) 2015; 1546
JS Brownstein (227_CR1) 2010; 33
AN Ananthakrishnan (227_CR28) 2015; 21
S Goryachev (227_CR36) 2006; 2006
227_CR40
WW Chapman (227_CR41) 2001; 34
JH Son (227_CR17) 2018; 103
Y Halpern (227_CR30) 2016; 23
LV Rasmussen (227_CR18) 2014; 51
KP Liao (227_CR11) 2010; 62
VM Castro (227_CR43) 2015; 13
RJ Carroll (227_CR21) 2012; 19
O Canela-Xandri (227_CR5) 2018; 50
S Yu (227_CR13) 2017; 24
AO Basile (227_CR19) 2018; 18
KP Liao (227_CR25) 2014; 73
Y Okada (227_CR27) 2014; 506
KP Liao (227_CR4) 2013; 65
JM Gaziano (227_CR6) 2016; 70
KJ O’Malley (227_CR10) 2005; 40
JC Denny (227_CR2) 2013; 31
FA Kurreeman (227_CR26) 2012; 90
SN Murphy (227_CR16) 2010; 17
AN Ananthakrishnan (227_CR20) 2013; 19
JA Sinnott (227_CR29) 2014; 133
S Yu (227_CR32) 2018; 25
A Geva (227_CR47) 2017; 188
AN Ananthakrishnan (227_CR23) 2014; 12
GK Savova (227_CR35) 2010; 17
KP Liao (227_CR12) 2015; 350
VM Castro (227_CR42) 2017; 88
227_CR39
227_CR38
A Jorge (227_CR44) 2019; 49
JM Banda (227_CR7) 2017
227_CR37
AN Kho (227_CR8) 2011; 3
S Yu (227_CR14) 2015; 22
DA Lindberg (227_CR33) 1993; 32
References_xml – volume: 19
  start-page: 1411
  year: 2013
  end-page: 1420
  ident: CR20
  article-title: Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach
  publication-title: Inflamm. Bowel. Dis.
– volume: 42
  start-page: 41
  year: 2012
  end-page: 50
  ident: CR45
  article-title: Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model
  publication-title: Psychol. Med.
– volume: 24
  start-page: e143
  year: 2017
  end-page: e149
  ident: CR13
  article-title: Surrogate-assisted feature extraction for high-throughput phenotyping
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 22
  start-page: 993
  year: 2015
  end-page: 1000
  ident: CR14
  article-title: Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 70
  start-page: 214
  year: 2016
  end-page: 223
  ident: CR6
  article-title: Million Veteran Program: a mega-biobank to study genetic influences on health and disease
  publication-title: J. Clin. Epidemiol.
– volume: 51
  start-page: 280
  year: 2014
  end-page: 286
  ident: CR18
  article-title: Design patterns for the development of electronic health record-driven phenotype extraction algorithms
  publication-title: J. Biomed. Inform.
– volume: 1546
  start-page: 118
  year: 2015
  end-page: 119
  ident: CR34
  article-title: A new ontology lookup service at EMBL-EBI
  publication-title: CEUR Workshop Proc.
– ident: CR39
– volume: 34
  start-page: 301
  year: 2001
  end-page: 310
  ident: CR41
  article-title: A simple algorithm for identifying negated findings and diseases in discharge summaries
  publication-title: J. Biomed. Inform.
– ident: CR37
– volume: 133
  start-page: 1369
  year: 2014
  end-page: 1382
  ident: CR29
  article-title: Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records
  publication-title: Hum. Genet.
– volume: 18
  start-page: 219
  year: 2018
  end-page: 226
  ident: CR19
  article-title: Informatics and machine learning to define the phenotype
  publication-title: Expert. Rev. Mol. Diagn.
– volume: 19
  start-page: e162
  year: 2012
  end-page: e169
  ident: CR21
  article-title: Portability of an algorithm to identify rheumatoid arthritis in electronic health records
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 23
  start-page: 1046
  year: 2016
  end-page: 1052
  ident: CR9
  article-title: PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 13
  year: 2015
  ident: CR43
  article-title: Identification of subjects with polycystic ovary syndrome using electronic health records
  publication-title: Reprod. Biol. Endocrinol.
– volume: 69
  start-page: 291
  year: 2017
  end-page: 300
  ident: CR46
  article-title: Phenome-wide association study of rheumatoid arthritis subgroups identifies association between seronegative disease and fibromyalgia
  publication-title: Arthritis Rheumatol.
– volume: 65
  start-page: 571
  year: 2013
  end-page: 581
  ident: CR4
  article-title: Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls
  publication-title: Arthritis Rheumatol.
– volume: 32
  start-page: 281
  year: 1993
  end-page: 291
  ident: CR33
  article-title: The Unified Medical Language System
  publication-title: Methods Inf. Med.
– volume: 172
  start-page: 363
  year: 2015
  end-page: 372
  ident: CR15
  article-title: Validation of electronic health record phenotyping of bipolar disorder cases and controls
  publication-title: Am. J. Psychiatry
– ident: CR40
– volume: 23
  start-page: 1166
  year: 2016
  end-page: 1173
  ident: CR31
  article-title: Learning statistical models of phenotypes using noisy labeled training data
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 73
  start-page: 1170
  year: 2014
  end-page: 1175
  ident: CR25
  article-title: Association between low density lipoprotein and rheumatoid arthritis genetic factors with low density lipoprotein levels in rheumatoid arthritis and non-rheumatoid arthritis controls
  publication-title: Ann. Rheum. Dis.
– volume: 350
  start-page: h1885
  year: 2015
  ident: CR12
  article-title: Development of phenotype algorithms using electronic medical records and incorporating natural language processing
  publication-title: BMJ
– volume: 90
  start-page: 524
  year: 2012
  end-page: 532
  ident: CR26
  article-title: Use of a multiethnic approach to identify rheumatoid- arthritis-susceptibility loci, 1p36 and 17q12
  publication-title: Am. J. Hum. Genet.
– volume: 40
  start-page: 1620
  year: 2005
  end-page: 1639
  ident: CR10
  article-title: Measuring diagnoses: ICD code accuracy
  publication-title: Health Serv. Res.
– volume: 62
  start-page: 1120
  year: 2010
  end-page: 1127
  ident: CR11
  article-title: Electronic medical records for discovery research in rheumatoid arthritis
  publication-title: Arthritis Care. Res.
– volume: 50
  start-page: 1593
  year: 2018
  end-page: 1599
  ident: CR5
  article-title: An atlas of genetic associations in UK Biobank
  publication-title: Nat. Genet.
– volume: 88
  start-page: 57
  year: 2011
  end-page: 69
  ident: CR3
  article-title: Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records
  publication-title: Am. J. Hum. Genet.
– ident: CR38
– volume: 24
  start-page: 2242
  year: 2018
  end-page: 2246
  ident: CR24
  article-title: The association between arthralgia and vedolizumab using natural language processing
  publication-title: Inflamm. Bowel. Dis.
– volume: 188
  start-page: 224
  year: 2017
  end-page: 231
  ident: CR47
  article-title: A computable phenotype improves cohort ascertainment in a pediatric pulmonary hypertension registry
  publication-title: J. Pediatr.
– volume: 31
  start-page: 1102
  year: 2013
  end-page: 1110
  ident: CR2
  article-title: Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data
  publication-title: Nat. Biotechnol.
– volume: 23
  start-page: 731
  year: 2016
  end-page: 740
  ident: CR30
  article-title: Electronic medical record phenotyping using the anchor and learn framework
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 17
  start-page: 507
  year: 2010
  end-page: 513
  ident: CR35
  article-title: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 49
  start-page: 84
  year: 2019
  end-page: 90
  ident: CR44
  article-title: Identifying lupus patients in electronic health records: development and validation of machine learning algorithms and application of rule-based algorithms
  publication-title: Semin. Arthritis Rheum.
– volume: 88
  start-page: 164
  year: 2017
  end-page: 168
  ident: CR42
  article-title: Large-scale identification of patients with cerebral aneurysms using natural language processing
  publication-title: Neurology
– volume: 3
  start-page: 79re71
  year: 2011
  ident: CR8
  article-title: Electronic medical records for genetic research: results of the eMERGE consortium
  publication-title: Sci. Transl. Med.
– volume: 103
  start-page: 58
  year: 2018
  end-page: 73
  ident: CR17
  article-title: Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes
  publication-title: Am. J. Hum. Genet.
– volume: 17
  start-page: 124
  year: 2010
  end-page: 130
  ident: CR16
  article-title: Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 2006
  start-page: 931
  year: 2006
  ident: CR36
  article-title: A suite of natural language processing tools developed for the I2B2 project
  publication-title: AMIA Annu. Symp. Proc.
– volume: 25
  start-page: 54
  year: 2018
  end-page: 60
  ident: CR32
  article-title: Enabling phenotypic big data with PheNorm
  publication-title: J. Am. Med. Inform. Assoc.
– volume: 21
  start-page: 2507
  year: 2015
  end-page: 2514
  ident: CR28
  article-title: Common genetic variants influence circulating vitamin D levels in inflammatory bowel diseases
  publication-title: Inflamm. Bowel. Dis.
– volume: 33
  start-page: 526
  year: 2010
  end-page: 531
  ident: CR1
  article-title: Rapid identification of myocardial infarction risk associated with diabetes medications using electronic medical records
  publication-title: Diabetes Care
– start-page: 48
  year: 2017
  end-page: 57
  ident: CR7
  article-title: Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network
  publication-title: AMIA Jt. Summit. Transl. Sci. Proc.
– volume: 506
  start-page: 376
  year: 2014
  end-page: 381
  ident: CR27
  article-title: Genetics of rheumatoid arthritis contributes to biology and drug discovery
  publication-title: Nature
– volume: 8
  start-page: e78927
  year: 2013
  ident: CR22
  article-title: Modeling disease severity in multiple sclerosis using electronic health records
  publication-title: PLoS One
– volume: 12
  start-page: 821
  year: 2014
  end-page: 827
  ident: CR23
  article-title: Association between reduced plasma 25-hydroxy vitamin D and increased risk of cancer in patients with inflammatory bowel diseases
  publication-title: Clin. Gastroenterol. Hepatol.
– volume: 23
  start-page: 731
  year: 2016
  ident: 227_CR30
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1093/jamia/ocw011
– volume: 50
  start-page: 1593
  year: 2018
  ident: 227_CR5
  publication-title: Nat. Genet.
  doi: 10.1038/s41588-018-0248-z
– volume: 12
  start-page: 821
  year: 2014
  ident: 227_CR23
  publication-title: Clin. Gastroenterol. Hepatol.
  doi: 10.1016/j.cgh.2013.10.011
– volume: 34
  start-page: 301
  year: 2001
  ident: 227_CR41
  publication-title: J. Biomed. Inform.
  doi: 10.1006/jbin.2001.1029
– volume: 24
  start-page: 2242
  year: 2018
  ident: 227_CR24
  publication-title: Inflamm. Bowel. Dis.
  doi: 10.1093/ibd/izy127
– volume: 90
  start-page: 524
  year: 2012
  ident: 227_CR26
  publication-title: Am. J. Hum. Genet.
  doi: 10.1016/j.ajhg.2012.01.010
– volume: 19
  start-page: 1411
  year: 2013
  ident: 227_CR20
  publication-title: Inflamm. Bowel. Dis.
  doi: 10.1097/MIB.0b013e31828133fd
– volume: 8
  start-page: e78927
  year: 2013
  ident: 227_CR22
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0078927
– volume: 88
  start-page: 164
  year: 2017
  ident: 227_CR42
  publication-title: Neurology
  doi: 10.1212/WNL.0000000000003490
– volume: 17
  start-page: 124
  year: 2010
  ident: 227_CR16
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1136/jamia.2009.000893
– volume: 42
  start-page: 41
  year: 2012
  ident: 227_CR45
  publication-title: Psychol. Med.
  doi: 10.1017/S0033291711000997
– volume: 88
  start-page: 57
  year: 2011
  ident: 227_CR3
  publication-title: Am. J. Hum. Genet.
  doi: 10.1016/j.ajhg.2010.12.007
– volume: 31
  start-page: 1102
  year: 2013
  ident: 227_CR2
  publication-title: Nat. Biotechnol.
  doi: 10.1038/nbt.2749
– volume: 22
  start-page: 993
  year: 2015
  ident: 227_CR14
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1093/jamia/ocv034
– volume: 1546
  start-page: 118
  year: 2015
  ident: 227_CR34
  publication-title: CEUR Workshop Proc.
– ident: 227_CR40
  doi: 10.3115/v1/P14-5010
– volume: 62
  start-page: 1120
  year: 2010
  ident: 227_CR11
  publication-title: Arthritis Care. Res.
  doi: 10.1002/acr.20184
– volume: 65
  start-page: 571
  year: 2013
  ident: 227_CR4
  publication-title: Arthritis Rheumatol.
  doi: 10.1002/art.37801
– volume: 21
  start-page: 2507
  year: 2015
  ident: 227_CR28
  publication-title: Inflamm. Bowel. Dis.
  doi: 10.1097/MIB.0000000000000524
– ident: 227_CR39
– volume: 73
  start-page: 1170
  year: 2014
  ident: 227_CR25
  publication-title: Ann. Rheum. Dis.
  doi: 10.1136/annrheumdis-2012-203202
– ident: 227_CR37
– volume: 40
  start-page: 1620
  year: 2005
  ident: 227_CR10
  publication-title: Health Serv. Res.
  doi: 10.1111/j.1475-6773.2005.00444.x
– volume: 3
  start-page: 79re71
  year: 2011
  ident: 227_CR8
  publication-title: Sci. Transl. Med.
– volume: 25
  start-page: 54
  year: 2018
  ident: 227_CR32
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1093/jamia/ocx111
– volume: 350
  start-page: h1885
  year: 2015
  ident: 227_CR12
  publication-title: BMJ
  doi: 10.1136/bmj.h1885
– volume: 18
  start-page: 219
  year: 2018
  ident: 227_CR19
  publication-title: Expert. Rev. Mol. Diagn.
  doi: 10.1080/14737159.2018.1439380
– volume: 506
  start-page: 376
  year: 2014
  ident: 227_CR27
  publication-title: Nature
  doi: 10.1038/nature12873
– volume: 19
  start-page: e162
  year: 2012
  ident: 227_CR21
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1136/amiajnl-2011-000583
– volume: 23
  start-page: 1166
  year: 2016
  ident: 227_CR31
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1093/jamia/ocw028
– volume: 69
  start-page: 291
  year: 2017
  ident: 227_CR46
  publication-title: Arthritis Rheumatol.
  doi: 10.1002/art.39851
– volume: 172
  start-page: 363
  year: 2015
  ident: 227_CR15
  publication-title: Am. J. Psychiatry
  doi: 10.1176/appi.ajp.2014.14030423
– volume: 33
  start-page: 526
  year: 2010
  ident: 227_CR1
  publication-title: Diabetes Care
  doi: 10.2337/dc09-1506
– volume: 13
  year: 2015
  ident: 227_CR43
  publication-title: Reprod. Biol. Endocrinol.
  doi: 10.1186/s12958-015-0115-z
– volume: 51
  start-page: 280
  year: 2014
  ident: 227_CR18
  publication-title: J. Biomed. Inform.
  doi: 10.1016/j.jbi.2014.06.007
– volume: 133
  start-page: 1369
  year: 2014
  ident: 227_CR29
  publication-title: Hum. Genet.
  doi: 10.1007/s00439-014-1466-9
– volume: 70
  start-page: 214
  year: 2016
  ident: 227_CR6
  publication-title: J. Clin. Epidemiol.
  doi: 10.1016/j.jclinepi.2015.09.016
– volume: 188
  start-page: 224
  year: 2017
  ident: 227_CR47
  publication-title: J. Pediatr.
  doi: 10.1016/j.jpeds.2017.05.037
– start-page: 48
  volume-title: AMIA Jt. Summit. Transl. Sci. Proc.
  year: 2017
  ident: 227_CR7
– volume: 49
  start-page: 84
  year: 2019
  ident: 227_CR44
  publication-title: Semin. Arthritis Rheum.
  doi: 10.1016/j.semarthrit.2019.01.002
– volume: 103
  start-page: 58
  year: 2018
  ident: 227_CR17
  publication-title: Am. J. Hum. Genet.
  doi: 10.1016/j.ajhg.2018.05.010
– volume: 32
  start-page: 281
  year: 1993
  ident: 227_CR33
  publication-title: Methods Inf. Med.
  doi: 10.1055/s-0038-1634945
– ident: 227_CR38
– volume: 17
  start-page: 507
  year: 2010
  ident: 227_CR35
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1136/jamia.2009.001560
– volume: 23
  start-page: 1046
  year: 2016
  ident: 227_CR9
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1093/jamia/ocv202
– volume: 2006
  start-page: 931
  year: 2006
  ident: 227_CR36
  publication-title: AMIA Annu. Symp. Proc.
– volume: 24
  start-page: e143
  year: 2017
  ident: 227_CR13
  publication-title: J. Am. Med. Inform. Assoc.
  doi: 10.1093/jamia/ocw135
SSID ssj0047367
Score 2.5901947
Snippet Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR)...
SourceID unpaywall
pubmedcentral
proquest
gale
pubmed
crossref
springer
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 3426
SubjectTerms 631/1647/48
631/1647/794
692/53
692/699
Algorithms
Analysis
Analytical Chemistry
Biological Techniques
Biomedical and Life Sciences
Computational Biology/Bioinformatics
Data Analysis
Data Interpretation, Statistical
Electronic health records
Electronic Health Records - statistics & numerical data
Electronic medical records
Electronic records
Genotype & phenotype
Health risks
High-throughput screening (Biochemical assaying)
High-Throughput Screening Assays - methods
Humans
Learning algorithms
Life Sciences
Machine Learning
Medical records
Methods
Microarrays
Natural Language Processing
Organic Chemistry
Patients
Phenotype
Phenotypes
Phenotyping
Protocol
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3raxQxEA_1iqgfxLerVaIUfJTQa7K3u_kgch4tVfA4aov9FpJs0haue6t7i9x_78y-rlukfs4sZGcm88hMfkPIduQdBMpDw2I_ilhotGCSawm6DN5JSLAOFVzT92l0eBJ-Ox2dbpBp-xYG2ypbm1gZ6nRh8Y58F-IQCc4LcrnP-S-GU6OwutqO0NDNaIX0UwUxdotsckTGGpDNL_vT2VFrm8NYVDNlwWeGDJybbOucItktwJVhQy4-6uE8ZlHPU12311cc1vVmyq6ieo_cKbNcr_7o-fyK0zp4QO430SYd1-rxkGy47BG5Xc-fXD0mOXZ5sGZUT14uKTZ8LZYrfENF8YaWrqfk0Mu6pEPrax2KraUUu-bPqKbAQuAPLdzlBSvKHO1P4VLaApbT97NzNxnPPjwhJwf7x5ND1oxgYDYaxkvmEZFNRCNrhAeR2kSGPtaJE9zspRAbmthHQhvueGq9Sa2Ufsi99xaykKGwI_GUDLJF5p4Tmhofgr5ARO8wzRlpaUwirTA8NaEWNiDDlt3KNvjkOCZjrqo6uUhULSEFElIoIRUF5GP3SV6Dc9xEvI0yVAh6kWFXzZkui0KNj39OpmoMeRzkckkoA_L2X2Rffxz1iN41RH4Be7S6ecsAf4pwWj3KrR4lHF7bX241SjXGo1BrVQ_Im24Zv8SGuMwtyoomwks7yQPyrFbAjgUQEmIauheQuKeaHQFCivdXsovzClo8FhDCyTAgO60Sr7d1A2d3Oj3_vxxe3PzLL8ldjmew6hjaIoPl79K9grhvaV43h_kvYbtTBw
  priority: 102
  providerName: ProQuest
Title High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
URI https://link.springer.com/article/10.1038/s41596-019-0227-6
https://www.ncbi.nlm.nih.gov/pubmed/31748751
https://www.proquest.com/docview/2319190907
https://www.proquest.com/docview/2316783592
https://pubmed.ncbi.nlm.nih.gov/PMC7323894
http://nrs.harvard.edu/urn-3:HUL.InstRepos:42083016
UnpaywallVersion submittedVersion
Volume 14
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 1750-2799
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0047367
  issn: 1750-2799
  databaseCode: AFBBN
  dateStart: 20190101
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Zb9NAEF7RRAh44D4MJVpQJY7KwfX62r45UaOAIIpKo4Yny7v2NhWpE8W2UPj1zNiOUxdU1Cdb2rFke77Zmdm5CNlzVAyGsiF0V9mObomQ6dwMOWAZtBPjsDsU7Zq-jZzhxPoytafbOu5klXZn5VCc0olfJTo7HE6-djF2jjZpeogBYYCls0Pajg0GeIu0J6Ox_6MofbQN3XSLqZFwb-mgwPgmlsm8TymoK0y6xcId03R1p6GNru7Jl5TS1YTJOmp6j9zJk2W4_hXO55cU0-ABOdmU95T5KD-7eSa68vff3R5v8s0Pyf3KUKV-iaxH5FacPCa3y9GV6ydkiQkiejXlZ5lnFHPFFtkay68oHu7S7YAdelFGg2h5IkQxK5Viwv0ZDSkgHiSBpvHFuZ7mS9y60jiim17n9P14Fvf98YenZDI4OukP9Wp6gy4dw810hc3cmGNLwRSgQXrcUm7oxcwUBxGYlcJVDguFGZuRVCKSnCvDVEpJcGAMJm32jLSSRRK_IDQSygKogTMQo4dkh1wIj0smzEhYIZMaMTZcDGTV2hwnbMyDIsTOvKBkfACMD5DxgaORj_Ujy7Kvx3XEewiNAPtlJJiQcxbmaRr4J6f9UeCDCwgo9Cyukbf_Ivv8_bhB9K4iUgt4RxlWZRDwpdiJq0G526AEuZfN5Q1Qg2rfSQOw1jmYeNxwNfKmXsYnMZcuiRd5QePgeR83NfK8xHX9C8CaRA_2QCNuA_E1AXYjb64k57OiK7nLwPrjlkb2N7Kxfa1r_ux-LT7_58PLG1G_IndNlPQi92iXtLJVHr8GCzITHbLjTt0OafuDXm8E197RaHzcqfaRP4NmbV0
linkProvider Unpaywall
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dT9swELcQaGJ7mPa9bGzzJqZ9oIgSp0n9gKauA7UDqgqKxptnOzYglTRbGqH-c_vbdpevEjSxJ559kRzf-X53vi9C1gNrwFBuKTe07cD1lWQu9yQHWQZ0Yhy0Q96u6WAY9I_97yftkyXyp6qFwbTKSifmijqaanwj3wQ7hAN4gS_3Jfnl4tQojK5WIzRkOVoh2s5bjJWFHXtmfgkuXLo9-Ab8fu95uzvjXt8tpwy4OmiFM9di0zEWtLViFnatO9y3oewY5qmtCMwfFdqASeUZL9JWRZpz2_KstRoM7RbTODUCIGDFZz4H52_l685wdFhhgR-yfIYtYLTvApjyKq7KOpspQCcmAGMRkeeFbtBAxuv4cAUgrydv1hHce2Q1ixM5v5STyRWQ3H1A7pfWLe0W4viQLJn4EblTzLucPyYJZpW45WigJJtRTDCbzuZYs0XxRZgupvLQiyKERItnJIqprBSz9E-ppMAy4AdNzcW5m2YJ6rvURLRqkE4_js5Mrzv69IQc3woznpLleBqb54RGyvogn-BBGHSr2pIr1eGaKS9SvmTaIa3quIUu-6HjWI6JyOPyrCMKDgngkEAOicAhn-tPkqIZyE3E68hDgU02YsziOZVZmoru-EdvKLrgN4Lv2PG5Q979i2xwdNgg-lAS2SnsUcuydgL-FNt3NSjXGpSgLHRzuZIoUSqrVCyulkPe1sv4JSbgxWaa5TQBPhJyzyHPCgGsjwBMUHR7txwSNkSzJsAW5s2V-Pwsb2UeMjAZue-QjUqIF9u64WQ3ajn_Px9e3PzLb8hqf3ywL_YHw72X5K6H9zHPVlojy7PfmXkFNudMvS4vNiU_b1uX_AW7c5D1
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLamIW4PiDuBAQYNcZmidnaaxA8IVR3VyqCa2Cb6ZmLH3iZ1aSCNpv41fh3n5NZlQuVpzz6RHJ_P52J_PoeQTd8aCJS7yg1sz3c9FXFXsEgAlsE7cQHWoSjX9G3s7x55Xya9yRr5U7-FQVplbRMLQx3PNJ6RdyAOEeC8IJfr2IoWsb8z_JT-crGDFN601u00SojsmcU5pG_Zx9EO6PoNY8PPh4Ndt-ow4Gq_G8xdiwXHuN_TiluYsQ6FZ4MoNJyp7RhCHxVYn0eKGRZrq2IthO0ya62GILvLNXaMAPN_LeBcIJ0wmDTJnhfwonsteGfPBTcq6htVHnYycJpI_cXnQ4wFrt_yiZc9wwXXeJm22dzd3iY38ySNFufRdHrBPQ7vkjtVXEv7JRDvkTWT3CfXy06XiwckRT6JWzUFSvM5RWrZbL7A11oUz4Lpsh8PPSsvj2h5gESRxEqRn39MIwrKgtWnmTk7dbM8RUuXmZjWpdHpu_0TM-jvv39Ijq5EFY_IejJLzBNCY2U9QCbkDgYTql4klAqF5orFyou4dki3Xm6pq0ro2JBjKosbeR7KUkMSNCRRQ9J3yIfmk7QsA7JKeBN1KLG8RoJAPY7yLJP9wx-DsexDxghZY-gJh7z-l9jo4HtL6G0lZGcwRx1VrybgT7FwV0tyoyUJZkK3h2tEycpMZXK5qRzyqhnGL5F6l5hZXsj4eDwomEMelwBslgCCT0x4tx0StKDZCGDx8vZIcnpSFDEPOASLwnPIVg3i5bRWrOxWg_P_6-Hp6l9-SW6ABZFfR-O9Z-QWw-1Y0JQ2yPr8d26eQ7A5Vy-KXU3Jz6s2I38BuASOjw
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3rb9MwELegEwI-8H4EBjJoEo_JJYvz8r5VFVNBUFWwauNTFDv2OtGlVZMIlb-eu7y6DDS0b5F8lpLc73x3vhchO77RYCjbkgXG85krY86EEwvAMmgnLuB0KNs1fR37o6n7-dg73tRxp6usP6uG4lRO_CplfH80_dLH2DnapNk-BoQBlv51suV7YID3yNZ0PBn8KEsfPZs5QTk1Ep5dBgpMNLFMHn7IQF1h0i0W7jhOwPyONrp4Jp9TShcTJtuo6W1ys0iX8fpXPJ-fU0wHd8lhU95T5aP87Be57Kvff3d7vMo33yN3akOVDipk3SfXdPqA3KhGV64fkiUmiLB6ys-yyCnmii3yNZZfUbzcpZsBO_SsigbR6kaIYlYqxYT7ExpTQDxIAs302SnLiiUeXZlOaNPrnL6dzPRwMHn3iEwPPh4OR6ye3sCUbwc5M9jMjfuektwAGlQoXBPEoeaO3EvArJSB8XksHe0kyshECWFsxxijwIGxufL4Y9JLF6l-SmgijQtQA2dAo4fkxULKUCgunUS6MVcWsRsuRqpubY4TNuZRGWLnYVQxPgLGR8j4yLfI-3bLsurrcRnxDkIjwn4ZKSbknMRFlkWDw6PhOBqACwgoDF1hkdf_Ivv0_VuH6E1NZBbwjiquyyDgS7ETV4dyu0MJcq-6yw1Qo_rcySKw1gWYeMIOLPKqXcadmEuX6kVR0vh43yccizypcN3-ArAm0YPds0jQQXxLgN3Iuyvp6azsSh5wsP6Ea5HdRjY2r3XJn91txef_fHh2Jern5JaDkl7mHm2TXr4q9AuwIHP5sj4x_gBS8Glg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-throughput+phenotyping+with+electronic+medical+record+data+using+a+common+semi-supervised+approach+%28PheCAP%29&rft.jtitle=Nature+protocols&rft.au=Zhang%2C+Yichi&rft.au=Cai%2C+Tianrun&rft.au=Yu%2C+Sheng&rft.au=Cho%2C+Kelly&rft.date=2019-12-01&rft.issn=1750-2799&rft.eissn=1750-2799&rft.volume=14&rft.issue=12&rft.spage=3426&rft_id=info:doi/10.1038%2Fs41596-019-0227-6&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1754-2189&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1754-2189&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1754-2189&client=summon