2416 A machine learning pipeline to predict acute kidney injury (AKI) in patients without AKI in their most recent hospitalization

OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute kidney injury (AKI) during rehospitalization for patients who did not have an AKI episode in their most recent hospitalization. METHODS/STUDY...

Full description

Saved in:

Bibliographic Details
Published in	Journal of clinical and translational science Vol. 1; no. S1; pp. 17 - 18
Main Authors	Weisenthal, Samuel, Weisenthal, Samuel J., Quill, Caroline, Luo, Jiebo, Kautz, Henry, Farooq, Samir, Zand, Martin
Format	Journal Article
Language	English
Published	Cambridge Cambridge University Press 01.09.2017
Subjects	Age Artificial intelligence Bicarbonates Biomedical Informatics/Health Informatics Calcium chloride Calcium phosphates Canopy gaps Codes Creatinine Data processing Datasets Diagnosis Distance learning Electronic medical records Entropy Gangrene Glomerular filtration rate Hemoglobin Hospitalization Kidneys Laboratories Learning algorithms Liver Minority & ethnic groups Necrosis Patients Population studies Renal failure Risk assessment Scaling Urea
Online Access	Get full text
ISSN	2059-8661 2059-8661
DOI	10.1017/cts.2017.75

Cover

Abstract	OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute kidney injury (AKI) during rehospitalization for patients who did not have an AKI episode in their most recent hospitalization. METHODS/STUDY POPULATION: The protocol under which this study falls was given exempt status by our institutional review board. The fully deidentified data set, containing all adult hospital admissions during a 2-year period, is a combination of administrative, laboratory, and pharmaceutical information. The administrative data set includes International Classification of Diseases, 9th Revision (ICD-9) diagnosis and procedure codes, Current Procedural Terminology, 4th Edition (CPT-4) procedure codes, diagnosis-related grouping (DRG) codes, locations visited in the hospital, discharge disposition, insurance, marital status, gender, age, ethnicity, and total length of stay. The laboratory data set includes bicarbonate, chloride, calcium, anion gap, phosphate, glomerular filtration rate, creatinine, urea nitrogen, albumin, total protein, liver function enzymes, and hemoglobin A1c. The pharmacy data set includes, for each medication, a description, pharmacologic class and subclass, and therapeutic class. Data preprocessing was performed using Python library Pandas (McKinney, 2011). Top-level binary representation (Singh, 2015) was used for diagnosis and procedure codes. Categorical variables were transformed via 1-hot encoding. Previous admissions were collapsed using rules informed by domain expertise (eg, the most recent age or sum of assigned diagnosis codes were retained as elements in the feature vector). We excluded any patient without at least 1 rehospitalization during the time window. We excluded any admission with or without AKI where AKI was also present in the most recent hospitalization. For comparison, we do not exclude such admissions for an identical experiment in which we considered any AKI event as a positive sample (regardless of AKI presence in the most recent hospitalization). We defined an AKI event as an assignment of any of the acute kidney failure (AKF) ICD-9 codes [584.5, AKF with lesion of tubular necrosis, 584.6, AKF with lesion of renal cortical necrosis, 584.7, AKF with lesion of renal medullary (papillary) necrosis, 584.8, AKF with other specified pathological lesion in kidney, or 584.9, AKF, unspecified]. Since diagnosis codes are believed to be specific but not sensitive for AKI (Waikar, 2006), we supplemented them using creatinine for patients who had laboratory values. Diagnosis was made according to the Kidney Disease: Improving Global Outcomes (KDIGO) Practice Guidelines (AKI defined as a 1.5-fold or greater increase in serum creatinine from baseline within 7 d or 0.3 mg/dL or greater increase in serum creatinine within 48 h). We report preliminary model discrimination via area under the receiver operating characteristic curve (AUC) using k-fold cross validation grouped by patient identifier (to ensure that admissions from the same patient would not appear in the training and validation set). It was confirmed that the prevalence of positive samples in the entire data set was maintained in each fold. Python library Sci-kit Learn (Pedregosa, 2011) was used for pipeline development, which consisted of imputation, scaling, and hyper-parameter tuning for penalized (l1 and l2 norm) logistic regression, random forest, and multilayer perceptron classifiers. All experiments were stored in IPython (Pérez, 2007) notebooks for easy viewing and result reproduction. RESULTS/ANTICIPATED RESULTS: There were 107,036 adult patients that accounted for 199,545 admissions during a 2-year window. Per admission, there were at most 54 ICD-9 diagnoses, 38 ICD-9 procedures, 314 CPT-4 procedures, and 25 hospital locations visited. The admissions were 55% female, the average age was 46±standard deviation 20, and average length of stay was 2.5±8.0 days. We excluded 2360 admissions that involved an AKI event that directly followed an admission with an AKI event and 4130 admissions that did not involve an AKI event but directly followed an admission with an AKI event. In total, there were 4561 (5.3%) positive samples (AKI during rehospitalization without AKI in the previous stay) generated by 3699 unique patients and 81,458 negative samples (non-AKI during rehospitalization without AKI in the previous stay) generated by 31,831 unique patients. When using any AKI event as a positive sample (regardless of whether or not AKI was in the most recent stay), the prevalence was 7.3% (6921 positive samples generated by 4395 unique patients and 85,588 negative samples generated by 33,287 unique patients). Best results were achieved with a code precision of 3 digits for which we had a total of 4556 features per patient. Fitted hyper-parameters corresponding to each classifier were logistic regression with l1 penalty C as 2×10 −3 ; logistic regression with l2 penalty C as 1×10 −6 ; random forest number of estimators as 100, maximum depth as 3, minimum samples per leaf as 50, minimum samples per split as 10, and entropy as the splitting criterion; and multilayer perceptron l2 regularization parameter α as 15, architecture as 1 hidden layer with 5 units, and learning rate as 0.001. Five-fold stratified cross validation on the development set yielded AUC for logistic regression with l1 penalty average 0.830±0.006, logistic regression with l2 penalty 0.796±0.007, random forest 0.828±0.007, and multilayer perceptron 0.841±0.005. In an identical experiment for which an AKI event was considered a positive sample regardless of AKI presence in the most recent stay, we had 4592 features per sample with the same code precision. Five-fold stratified cross validation on the development set with identical settings for the hyper-parameters yielded AUC for logistic regression with l1 penalty average 0.850±0.004, logistic regression with l2 penalty 0.819±0.006, random forest 0.853±0.004, and multilayer perceptron 0.853±0.006. DISCUSSION/SIGNIFICANCE OF IMPACT: Our objective was to investigate the feasibility of using machine learning methods on EHR data to provide a personalized risk assessment for “unexpected” AKI in rehospitalized patients. Preliminary model discrimination was good, suggesting that this approach is feasible. Such a model could aid clinicians to recognize AKI risk in unsuspicious patients. The authors recognize several limitations. Since our data set corresponds to a time-window sample, patients with high frequency of hospital utilization are likely overrepresented. Similarly, our data set contains records from only 1 hospital network. Although we supplement with laboratory-based diagnosis, using diagnosis codes as labels is problematic as numerous reports suggest low sensitivity of codes for AKI. Future work includes calibration analysis, incremental updating (“online learning”), and a representation learning-based (“deep learning”) extension of the model.
AbstractList	OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute kidney injury (AKI) during rehospitalization for patients who did not have an AKI episode in their most recent hospitalization. METHODS/STUDY POPULATION: The protocol under which this study falls was given exempt status by our institutional review board. The fully deidentified data set, containing all adult hospital admissions during a 2-year period, is a combination of administrative, laboratory, and pharmaceutical information. The administrative data set includes International Classification of Diseases, 9th Revision (ICD-9) diagnosis and procedure codes, Current Procedural Terminology, 4th Edition (CPT-4) procedure codes, diagnosis-related grouping (DRG) codes, locations visited in the hospital, discharge disposition, insurance, marital status, gender, age, ethnicity, and total length of stay. The laboratory data set includes bicarbonate, chloride, calcium, anion gap, phosphate, glomerular filtration rate, creatinine, urea nitrogen, albumin, total protein, liver function enzymes, and hemoglobin A1c. The pharmacy data set includes, for each medication, a description, pharmacologic class and subclass, and therapeutic class. Data preprocessing was performed using Python library Pandas (McKinney, 2011). Top-level binary representation (Singh, 2015) was used for diagnosis and procedure codes. Categorical variables were transformed via 1-hot encoding. Previous admissions were collapsed using rules informed by domain expertise (eg, the most recent age or sum of assigned diagnosis codes were retained as elements in the feature vector). We excluded any patient without at least 1 rehospitalization during the time window. We excluded any admission with or without AKI where AKI was also present in the most recent hospitalization. For comparison, we do not exclude such admissions for an identical experiment in which we considered any AKI event as a positive sample (regardless of AKI presence in the most recent hospitalization). We defined an AKI event as an assignment of any of the acute kidney failure (AKF) ICD-9 codes [584.5, AKF with lesion of tubular necrosis, 584.6, AKF with lesion of renal cortical necrosis, 584.7, AKF with lesion of renal medullary (papillary) necrosis, 584.8, AKF with other specified pathological lesion in kidney, or 584.9, AKF, unspecified]. Since diagnosis codes are believed to be specific but not sensitive for AKI (Waikar, 2006), we supplemented them using creatinine for patients who had laboratory values. Diagnosis was made according to the Kidney Disease: Improving Global Outcomes (KDIGO) Practice Guidelines (AKI defined as a 1.5-fold or greater increase in serum creatinine from baseline within 7 d or 0.3 mg/dL or greater increase in serum creatinine within 48 h). We report preliminary model discrimination via area under the receiver operating characteristic curve (AUC) using k-fold cross validation grouped by patient identifier (to ensure that admissions from the same patient would not appear in the training and validation set). It was confirmed that the prevalence of positive samples in the entire data set was maintained in each fold. Python library Sci-kit Learn (Pedregosa, 2011) was used for pipeline development, which consisted of imputation, scaling, and hyper-parameter tuning for penalized (l1 and l2 norm) logistic regression, random forest, and multilayer perceptron classifiers. All experiments were stored in IPython (Pérez, 2007) notebooks for easy viewing and result reproduction. RESULTS/ANTICIPATED RESULTS: There were 107,036 adult patients that accounted for 199,545 admissions during a 2-year window. Per admission, there were at most 54 ICD-9 diagnoses, 38 ICD-9 procedures, 314 CPT-4 procedures, and 25 hospital locations visited. The admissions were 55% female, the average age was 46±standard deviation 20, and average length of stay was 2.5±8.0 days. We excluded 2360 admissions that involved an AKI event that directly followed an admission with an AKI event and 4130 admissions that did not involve an AKI event but directly followed an admission with an AKI event. In total, there were 4561 (5.3%) positive samples (AKI during rehospitalization without AKI in the previous stay) generated by 3699 unique patients and 81,458 negative samples (non-AKI during rehospitalization without AKI in the previous stay) generated by 31,831 unique patients. When using any AKI event as a positive sample (regardless of whether or not AKI was in the most recent stay), the prevalence was 7.3% (6921 positive samples generated by 4395 unique patients and 85,588 negative samples generated by 33,287 unique patients). Best results were achieved with a code precision of 3 digits for which we had a total of 4556 features per patient. Fitted hyper-parameters corresponding to each classifier were logistic regression with l1 penalty C as 2×10 −3 ; logistic regression with l2 penalty C as 1×10 −6 ; random forest number of estimators as 100, maximum depth as 3, minimum samples per leaf as 50, minimum samples per split as 10, and entropy as the splitting criterion; and multilayer perceptron l2 regularization parameter α as 15, architecture as 1 hidden layer with 5 units, and learning rate as 0.001. Five-fold stratified cross validation on the development set yielded AUC for logistic regression with l1 penalty average 0.830±0.006, logistic regression with l2 penalty 0.796±0.007, random forest 0.828±0.007, and multilayer perceptron 0.841±0.005. In an identical experiment for which an AKI event was considered a positive sample regardless of AKI presence in the most recent stay, we had 4592 features per sample with the same code precision. Five-fold stratified cross validation on the development set with identical settings for the hyper-parameters yielded AUC for logistic regression with l1 penalty average 0.850±0.004, logistic regression with l2 penalty 0.819±0.006, random forest 0.853±0.004, and multilayer perceptron 0.853±0.006. DISCUSSION/SIGNIFICANCE OF IMPACT: Our objective was to investigate the feasibility of using machine learning methods on EHR data to provide a personalized risk assessment for “unexpected” AKI in rehospitalized patients. Preliminary model discrimination was good, suggesting that this approach is feasible. Such a model could aid clinicians to recognize AKI risk in unsuspicious patients. The authors recognize several limitations. Since our data set corresponds to a time-window sample, patients with high frequency of hospital utilization are likely overrepresented. Similarly, our data set contains records from only 1 hospital network. Although we supplement with laboratory-based diagnosis, using diagnosis codes as labels is problematic as numerous reports suggest low sensitivity of codes for AKI. Future work includes calibration analysis, incremental updating (“online learning”), and a representation learning-based (“deep learning”) extension of the model. OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute kidney injury (AKI) during rehospitalization for patients who did not have an AKI episode in their most recent hospitalization. METHODS/STUDY POPULATION: The protocol under which this study falls was given exempt status by our institutional review board. The fully deidentified data set, containing all adult hospital admissions during a 2-year period, is a combination of administrative, laboratory, and pharmaceutical information. The administrative data set includes International Classification of Diseases, 9th Revision (ICD-9) diagnosis and procedure codes, Current Procedural Terminology, 4th Edition (CPT-4) procedure codes, diagnosis-related grouping (DRG) codes, locations visited in the hospital, discharge disposition, insurance, marital status, gender, age, ethnicity, and total length of stay. The laboratory data set includes bicarbonate, chloride, calcium, anion gap, phosphate, glomerular filtration rate, creatinine, urea nitrogen, albumin, total protein, liver function enzymes, and hemoglobin A1c. The pharmacy data set includes, for each medication, a description, pharmacologic class and subclass, and therapeutic class. Data preprocessing was performed using Python library Pandas (McKinney, 2011). Top-level binary representation (Singh, 2015) was used for diagnosis and procedure codes. Categorical variables were transformed via 1-hot encoding. Previous admissions were collapsed using rules informed by domain expertise (eg, the most recent age or sum of assigned diagnosis codes were retained as elements in the feature vector). We excluded any patient without at least 1 rehospitalization during the time window. We excluded any admission with or without AKI where AKI was also present in the most recent hospitalization. For comparison, we do not exclude such admissions for an identical experiment in which we considered any AKI event as a positive sample (regardless of AKI presence in the most recent hospitalization). We defined an AKI event as an assignment of any of the acute kidney failure (AKF) ICD-9 codes [584.5, AKF with lesion of tubular necrosis, 584.6, AKF with lesion of renal cortical necrosis, 584.7, AKF with lesion of renal medullary (papillary) necrosis, 584.8, AKF with other specified pathological lesion in kidney, or 584.9, AKF, unspecified]. Since diagnosis codes are believed to be specific but not sensitive for AKI (Waikar, 2006), we supplemented them using creatinine for patients who had laboratory values. Diagnosis was made according to the Kidney Disease: Improving Global Outcomes (KDIGO) Practice Guidelines (AKI defined as a 1.5-fold or greater increase in serum creatinine from baseline within 7 d or 0.3 mg/dL or greater increase in serum creatinine within 48 h). We report preliminary model discrimination via area under the receiver operating characteristic curve (AUC) using k-fold cross validation grouped by patient identifier (to ensure that admissions from the same patient would not appear in the training and validation set). It was confirmed that the prevalence of positive samples in the entire data set was maintained in each fold. Python library Sci-kit Learn (Pedregosa, 2011) was used for pipeline development, which consisted of imputation, scaling, and hyper-parameter tuning for penalized (l1 and l2 norm) logistic regression, random forest, and multilayer perceptron classifiers. All experiments were stored in IPython (Pérez, 2007) notebooks for easy viewing and result reproduction. RESULTS/ANTICIPATED RESULTS: There were 107,036 adult patients that accounted for 199,545 admissions during a 2-year window. Per admission, there were at most 54 ICD-9 diagnoses, 38 ICD-9 procedures, 314 CPT-4 procedures, and 25 hospital locations visited. The admissions were 55% female, the average age was 46±standard deviation 20, and average length of stay was 2.5±8.0 days. We excluded 2360 admissions that involved an AKI event that directly followed an admission with an AKI event and 4130 admissions that did not involve an AKI event but directly followed an admission with an AKI event. In total, there were 4561 (5.3%) positive samples (AKI during rehospitalization without AKI in the previous stay) generated by 3699 unique patients and 81,458 negative samples (non-AKI during rehospitalization without AKI in the previous stay) generated by 31,831 unique patients. When using any AKI event as a positive sample (regardless of whether or not AKI was in the most recent stay), the prevalence was 7.3% (6921 positive samples generated by 4395 unique patients and 85,588 negative samples generated by 33,287 unique patients). Best results were achieved with a code precision of 3 digits for which we had a total of 4556 features per patient. Fitted hyper-parameters corresponding to each classifier were logistic regression with l1 penalty C as 2×10-3; logistic regression with l2 penalty C as 1×10-6; random forest number of estimators as 100, maximum depth as 3, minimum samples per leaf as 50, minimum samples per split as 10, and entropy as the splitting criterion; and multilayer perceptron l2 regularization parameter[...]as 15, architecture as 1 hidden layer with 5 units, and learning rate as 0.001. Five-fold stratified cross validation on the development set yielded AUC for logistic regression with l1 penalty average 0.830±0.006, logistic regression with l2 penalty 0.796±0.007, random forest 0.828±0.007, and multilayer perceptron 0.841±0.005. In an identical experiment for which an AKI event was considered a positive sample regardless of AKI presence in the most recent stay, we had 4592 features per sample with the same code precision. Five-fold stratified cross validation on the development set with identical settings for the hyper-parameters yielded AUC for logistic regression with l1 penalty average 0.850±0.004, logistic regression with l2 penalty 0.819±0.006, random forest 0.853±0.004, and multilayer perceptron 0.853±0.006. DISCUSSION/SIGNIFICANCE OF IMPACT: Our objective was to investigate the feasibility of using machine learning methods on EHR data to provide a personalized risk assessment for "unexpected" AKI in rehospitalized patients. Preliminary model discrimination was good, suggesting that this approach is feasible. Such a model could aid clinicians to recognize AKI risk in unsuspicious patients. The authors recognize several limitations. Since our data set corresponds to a time-window sample, patients with high frequency of hospital utilization are likely overrepresented. Similarly, our data set contains records from only 1 hospital network. Although we supplement with laboratory-based diagnosis, using diagnosis codes as labels is problematic as numerous reports suggest low sensitivity of codes for AKI. Future work includes calibration analysis, incremental updating ("online learning"), and a representation learning-based ("deep learning") extension of the model. OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute kidney injury (AKI) during rehospitalization for patients who did not have an AKI episode in their most recent hospitalization. METHODS/STUDY POPULATION: The protocol under which this study falls was given exempt status by our institutional review board. The fully deidentified data set, containing all adult hospital admissions during a 2-year period, is a combination of administrative, laboratory, and pharmaceutical information. The administrative data set includes International Classification of Diseases, 9th Revision (ICD-9) diagnosis and procedure codes, Current Procedural Terminology, 4th Edition (CPT-4) procedure codes, diagnosis-related grouping (DRG) codes, locations visited in the hospital, discharge disposition, insurance, marital status, gender, age, ethnicity, and total length of stay. The laboratory data set includes bicarbonate, chloride, calcium, anion gap, phosphate, glomerular filtration rate, creatinine, urea nitrogen, albumin, total protein, liver function enzymes, and hemoglobin A1c. The pharmacy data set includes, for each medication, a description, pharmacologic class and subclass, and therapeutic class. Data preprocessing was performed using Python library Pandas (McKinney, 2011). Top-level binary representation (Singh, 2015) was used for diagnosis and procedure codes. Categorical variables were transformed via 1-hot encoding. Previous admissions were collapsed using rules informed by domain expertise (eg, the most recent age or sum of assigned diagnosis codes were retained as elements in the feature vector). We excluded any patient without at least 1 rehospitalization during the time window. We excluded any admission with or without AKI where AKI was also present in the most recent hospitalization. For comparison, we do not exclude such admissions for an identical experiment in which we considered any AKI event as a positive sample (regardless of AKI presence in the most recent hospitalization). We defined an AKI event as an assignment of any of the acute kidney failure (AKF) ICD-9 codes [584.5, AKF with lesion of tubular necrosis, 584.6, AKF with lesion of renal cortical necrosis, 584.7, AKF with lesion of renal medullary (papillary) necrosis, 584.8, AKF with other specified pathological lesion in kidney, or 584.9, AKF, unspecified]. Since diagnosis codes are believed to be specific but not sensitive for AKI (Waikar, 2006), we supplemented them using creatinine for patients who had laboratory values. Diagnosis was made according to the Kidney Disease: Improving Global Outcomes (KDIGO) Practice Guidelines (AKI defined as a 1.5-fold or greater increase in serum creatinine from baseline within 7 d or 0.3 mg/dL or greater increase in serum creatinine within 48 h). We report preliminary model discrimination via area under the receiver operating characteristic curve (AUC) using k-fold cross validation grouped by patient identifier (to ensure that admissions from the same patient would not appear in the training and validation set). It was confirmed that the prevalence of positive samples in the entire data set was maintained in each fold. Python library Sci-kit Learn (Pedregosa, 2011) was used for pipeline development, which consisted of imputation, scaling, and hyper-parameter tuning for penalized (l1 and l2 norm) logistic regression, random forest, and multilayer perceptron classifiers. All experiments were stored in IPython (Pérez, 2007) notebooks for easy viewing and result reproduction. RESULTS/ANTICIPATED RESULTS: There were 107,036 adult patients that accounted for 199,545 admissions during a 2-year window. Per admission, there were at most 54 ICD-9 diagnoses, 38 ICD-9 procedures, 314 CPT-4 procedures, and 25 hospital locations visited. The admissions were 55% female, the average age was 46±standard deviation 20, and average length of stay was 2.5±8.0 days. We excluded 2360 admissions that involved an AKI event that directly followed an admission with an AKI event and 4130 admissions that did not involve an AKI event but directly followed an admission with an AKI event. In total, there were 4561 (5.3%) positive samples (AKI during rehospitalization without AKI in the previous stay) generated by 3699 unique patients and 81,458 negative samples (non-AKI during rehospitalization without AKI in the previous stay) generated by 31,831 unique patients. When using any AKI event as a positive sample (regardless of whether or not AKI was in the most recent stay), the prevalence was 7.3% (6921 positive samples generated by 4395 unique patients and 85,588 negative samples generated by 33,287 unique patients). Best results were achieved with a code precision of 3 digits for which we had a total of 4556 features per patient. Fitted hyper-parameters corresponding to each classifier were logistic regression with l1 penalty C as 2×10−3; logistic regression with l2 penalty C as 1×10−6; random forest number of estimators as 100, maximum depth as 3, minimum samples per leaf as 50, minimum samples per split as 10, and entropy as the splitting criterion; and multilayer perceptron l2 regularization parameter α as 15, architecture as 1 hidden layer with 5 units, and learning rate as 0.001. Five-fold stratified cross validation on the development set yielded AUC for logistic regression with l1 penalty average 0.830±0.006, logistic regression with l2 penalty 0.796±0.007, random forest 0.828±0.007, and multilayer perceptron 0.841±0.005. In an identical experiment for which an AKI event was considered a positive sample regardless of AKI presence in the most recent stay, we had 4592 features per sample with the same code precision. Five-fold stratified cross validation on the development set with identical settings for the hyper-parameters yielded AUC for logistic regression with l1 penalty average 0.850±0.004, logistic regression with l2 penalty 0.819±0.006, random forest 0.853±0.004, and multilayer perceptron 0.853±0.006. DISCUSSION/SIGNIFICANCE OF IMPACT: Our objective was to investigate the feasibility of using machine learning methods on EHR data to provide a personalized risk assessment for “unexpected” AKI in rehospitalized patients. Preliminary model discrimination was good, suggesting that this approach is feasible. Such a model could aid clinicians to recognize AKI risk in unsuspicious patients. The authors recognize several limitations. Since our data set corresponds to a time-window sample, patients with high frequency of hospital utilization are likely overrepresented. Similarly, our data set contains records from only 1 hospital network. Although we supplement with laboratory-based diagnosis, using diagnosis codes as labels is problematic as numerous reports suggest low sensitivity of codes for AKI. Future work includes calibration analysis, incremental updating (“online learning”), and a representation learning-based (“deep learning”) extension of the model.
Author	Luo, Jiebo Kautz, Henry Farooq, Samir Quill, Caroline Weisenthal, Samuel Zand, Martin Weisenthal, Samuel J.
Author_xml	– sequence: 1 givenname: Samuel surname: Weisenthal fullname: Weisenthal, Samuel – sequence: 2 givenname: Samuel J. surname: Weisenthal fullname: Weisenthal, Samuel J. – sequence: 3 givenname: Caroline surname: Quill fullname: Quill, Caroline – sequence: 4 givenname: Jiebo surname: Luo fullname: Luo, Jiebo – sequence: 5 givenname: Henry surname: Kautz fullname: Kautz, Henry – sequence: 6 givenname: Samir surname: Farooq fullname: Farooq, Samir – sequence: 7 givenname: Martin surname: Zand fullname: Zand, Martin
BookMark	eNpVkE9LxDAQxYNUcF335EfwKK0zSTNJL4Is_oMFL3oOaZtol912TVrBb2-WXURP85h5vHn8zlnWD71j7BKhQEB104yx4EkUSp6wGQdZ5ZoIsz_6jC1iXAMAak4kxIxlvES6YKfebqJbHOecvT3cvy6f8tXL4_PybpU3iJXKhWuk4L7VXklCKK2vyGlJdc1rT1bx5EIlkVdeYOsBGtDc8bRS6CyQmLPbQ-5uqreubVw_Brsxu9Btbfg2g-3M_0vffZj34cuQqjRVKgVcHQPC8Dm5OJr1MIU-dTYcBGmhZCmS6_rgasIQY3D-9wOC2aMyCZXZozJKih8u3lo7
ContentType	Journal Article
Copyright	The Association for Clinical and Translational Science 2018 This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. The Association for Clinical and Translational Science 2018 2018 The Association for Clinical and Translational Science
Copyright_xml	– notice: The Association for Clinical and Translational Science 2018 This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. – notice: The Association for Clinical and Translational Science 2018 2018 The Association for Clinical and Translational Science
DBID	AAYXX CITATION 8FE 8FH AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS 5PM
DOI	10.1017/cts.2017.75
DatabaseName	CrossRef ProQuest SciTech Collection ProQuest Natural Science Journals ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Collection ProQuest Central (subscription) Natural Science Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Biological Science Collection Biological Science Database (Proquest) ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China PubMed Central (Full Participant titles)
DatabaseTitle	CrossRef ProQuest Central Student ProQuest Biological Science Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection Biological Science Database ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	CrossRef ProQuest Central Student
Database_xml	– sequence: 1 dbid: BENPR name: ProQuest Central url: http://www.proquest.com/pqcentral?accountid=15518 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
EISSN	2059-8661
EndPage	18
ExternalDocumentID	PMC6798697 10_1017_cts_2017_75
GroupedDBID	09C 09E 0R~ 8FE 8FH AABES AABWE AAGFV AAKTX AASVR AAYXX ABGDZ ABQTM ABROB ABVZP ABXHF ACAJB ACBEK ACDLN ACGFS ACUIJ ADAZD ADBBV ADDNB ADKIL ADOVH ADVJH AEBAK AEHGV AEYHU AFKQG AFKRA AFLVW AFZFC AGABE AGJUD AHIPN AHQXX AHRGI AIGNW AIHIV AIOIP AJCYY AKMAY ALMA_UNASSIGNED_HOLDINGS ANPSP AQJOH ARCSS AUXHV AZGZS BBLKV BBNVY BCNDV BENPR BHPHI BLZWO BMAJL BRIRG CBIIA CCPQU CCQAD CFAFE CITATION CJCSC DOHLZ GROUPED_DOAJ HCIFZ HYE IKXGN IOEEP IPYYG JHPGK JKPOH JQKCU JVRFK KCGVB KFECR LK8 M7P M~E NIKVX OK1 PHGZM PHGZT PQGLB PUEGO RCA ROL RPM S6U SAAAG T9M WFFJZ ZYDXJ AZQEC DWQXO GNUQQ PKEHL PQEST PQQKQ PQUKI PRINS 5PM
ID	FETCH-LOGICAL-c1197-3ec532fd8f756104af96e856bb2bf6a72119175129f31df00c082e217571ea063
IEDL.DBID	BENPR
ISSN	2059-8661
IngestDate	Tue Sep 30 16:49:23 EDT 2025 Fri Jul 25 12:02:08 EDT 2025 Wed Oct 01 01:12:05 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	S1
Language	English
License	http://creativecommons.org/licenses/by/4.0 This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c1197-3ec532fd8f756104af96e856bb2bf6a72119175129f31df00c082e217571ea063
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
OpenAccessLink	http://dx.doi.org/10.1017/cts.2017.75
PQID	2036837543
PQPubID	2046292
PageCount	2
ParticipantIDs	pubmedcentral_primary_oai_pubmedcentral_nih_gov_6798697 proquest_journals_2036837543 crossref_primary_10_1017_cts_2017_75
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20170901
PublicationDateYYYYMMDD	2017-09-01
PublicationDate_xml	– month: 09 year: 2017 text: 20170901 day: 01
PublicationDecade	2010
PublicationPlace	Cambridge
PublicationPlace_xml	– name: Cambridge – name: Cambridge, UK
PublicationTitle	Journal of clinical and translational science
PublicationYear	2017
Publisher	Cambridge University Press
Publisher_xml	– name: Cambridge University Press
SSID	ssj0001826633
Score	2.0022569
Snippet	OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute...
SourceID	pubmedcentral proquest crossref
SourceType	Open Access Repository Aggregation Database Index Database
StartPage	17
SubjectTerms	Age Artificial intelligence Bicarbonates Biomedical Informatics/Health Informatics Calcium chloride Calcium phosphates Canopy gaps Codes Creatinine Data processing Datasets Diagnosis Distance learning Electronic medical records Entropy Gangrene Glomerular filtration rate Hemoglobin Hospitalization Kidneys Laboratories Learning algorithms Liver Minority & ethnic groups Necrosis Patients Population studies Renal failure Risk assessment Scaling Urea
Subtitle	A machine learning pipeline to predict acute kidney injury (AKI) in patients without AKI in their most recent hospitalization
Title	2416
URI	https://www.proquest.com/docview/2036837543 https://pubmed.ncbi.nlm.nih.gov/PMC6798697
Volume	1
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAEN databaseName: Cambridge University Press Wholly Gold Open Access Journals customDbUrl: eissn: 2059-8661 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001826633 issn: 2059-8661 databaseCode: IKXGN dateStart: 20170201 isFulltext: true titleUrlDefault: http://journals.cambridge.org/action/login providerName: Cambridge University Press – providerCode: PRVAEN databaseName: Cambridge Wholly Gold Open Access Journals customDbUrl: eissn: 2059-8661 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001826633 issn: 2059-8661 databaseCode: IPYYG dateStart: 20170201 isFulltext: true titleUrlDefault: https://www.cambridge.org providerName: Cambridge University Press – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2059-8661 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001826633 issn: 2059-8661 databaseCode: DOA dateStart: 20170101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2059-8661 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001826633 issn: 2059-8661 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 2059-8661 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001826633 issn: 2059-8661 databaseCode: RPM dateStart: 20170101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 2059-8661 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001826633 issn: 2059-8661 databaseCode: BENPR dateStart: 20170201 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB5qevEiiorVWnroNZpkk53kIKLSWhWCiIXewmazi17SKvH_O5OHbS8ew8KS2VnmsfPNNwCT0KCnfdSusooSlIR0oVShXUOJWBJ6KvdMjbZI5XwRPi-jZQ_SrheGYZWdTawNdbHS_EZ-zQWzmOe1itv1l8tTo7i62o3QUO1oheKmphjbg37AzFgO9O-n6evb5tWFomkpRNuox9zRumLSbh-vGGm47Zo28eYuWnLL_cwO4aCNG8d3jaKPoGfKY_IIFOicwGI2fX-Yu-1YA1dzzc4VRkcisEVskYOXUNlEmjiSeR7kViqsOdeQHbEVfmE9T5ObNpQ6ROgbRSHFKTjlqjRnMKZkiTtRLWKOoSLjY5ntRsfSeEaYoBjApJMoWzfsFVkD68KMBM9Y8AyjAQw7abP2CvNad-ADwJ0T-NuKqal3V8rPj5qimms7MsHz_ze-gH3-gwayNQSn-v4xl-Tjq3zUKm5U58j09fSyfEx_ARoTqEU
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8JAEJ4gHPRiNGpEUTngsdp22257IMYHpAgSYyDhVrfb3eiloGKMf87f5kwfAhdvnDfZdDq7M9_szHwD0HIUN6XFpSG0wAAlQF0IkUhDYSAWOKaITZVVWwy9cOzcT9xJBX7KXhgqqyxtYmaok6mkN_JLSpj5NK-VXc3eDJoaRdnVcoSGKEYrJO2MYqxo7Oir7y8M4T7avTvU97ltdzuj29AopgwYklJoBlPSZbZOfM0JSzhCB57yXS-O7Vh7gmcUaJz8omZWok1TotdUiORdbimBHh733YCaw5wAg7_aTWf4-LR45UH07jFWNAYSV7WcE0m4xS-osnHZFS7w7Wp15pK76-7AdoFTm9f5wdqFikr30AMhsNqH8VoEPIBqOk3VITQxOKPOV815zB2Bxk4Tu470PWUqpuykDq1SomiWs2VEeRkZj1DwiASPuFuHRiltVFwZWisVXAe-8gf-tiIq7NWV9PUlo8SmXJIX8KP_Nz6DzXD0MIgGvWH_GLboa_JysQZU5--f6gTxxTw-LZTYhOd1n5tf4qHgXg
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8JAEJ4gJMaL0agRReWAx0rf2x6IUYGAGEKMJNzqdrsbvRTUGuNf9Fc5024FLt44b7LpdHbnsfPNNwAtVzJTWEwYXHFMUELUBeeJMCQmYqFr8tiUOdpi7A-m7v3Mm1Xgp-yFIVhlaRNzQ53MBb2Rt6lgFtC8VqetNCxi0u1fL94MmiBFldZynAbXYxaSTk43pps8RvL7C9O5j86wi7q_tO1-7-luYOiJA4agcprhSOE5tkoCxSiucLkKfRl4fhzbsfI5y-nQGPlI5ViJMk2BHlRiVO8xS3L09rjvFtSo-IVGonbbG08ely8-GMn7jqObBIm3WmREGG6xK0I5rrrFZay7jtRccX39PdjVMWvzpjhk-1CR6QF6IwyyDmG6EQGPoJrOU3kMTUzUqAtWMRYzl6PhU8S0IwJfmtKRdlKHVilRtCiYM6ICUsYiFDwiwSPm1aFRShvp60NrpbLrwNb-wN9WRIu9vpK-vuT02FRX8kN28v_GF7CN5yd6GI5Hp7BDH1MgxxpQzd4_5RmGGll8rnXYhOdNH5tfEmjkjQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=2416&rft.jtitle=Journal+of+clinical+and+translational+science&rft.au=Weisenthal%2C+Samuel&rft.au=Weisenthal%2C+Samuel+J.&rft.au=Quill%2C+Caroline&rft.au=Luo%2C+Jiebo&rft.date=2017-09-01&rft.issn=2059-8661&rft.eissn=2059-8661&rft.volume=1&rft.issue=S1&rft.spage=17&rft.epage=18&rft_id=info:doi/10.1017%2Fcts.2017.75&rft.externalDBID=n%2Fa&rft.externalDocID=10_1017_cts_2017_75
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2059-8661&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2059-8661&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2059-8661&client=summon