Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data

Medical diagnostics is often a multi-attribute problem, necessitating sophisticated tools for analyzing high-dimensional biomedical data. Mining this data often results in two crucial bottlenecks: 1) high dimensionality of features used to represent rich biological data and 2) small amounts of label...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 11; no. 7; p. e0159088
Main Authors	Lee, George, Romo Bucheli, David Edmundo, Madabhushi, Anant
Format	Journal Article
Language	English
Published	United States Public Library of Science 15.07.2016 Public Library of Science (PLoS)
Subjects	Active learning Algorithms Artificial Intelligence Biology and Life Sciences Biomedical data Biomedical engineering Brain Breast - pathology Breast Neoplasms - diagnosis Classification Computer and Information Sciences Data mining Data processing Diagnosis, Computer-Assisted - methods Diagnostic imaging Dimensional analysis Disease Embedding Female Gene expression Gene Expression Regulation, Neoplastic Histopathology Humans Image Processing, Computer-Assisted - methods Learning Magnetic resonance Magnetic resonance imaging Male Medicine and Health Sciences Methods Mitosis Neuroimaging NMR Nuclear magnetic resonance Ovarian Neoplasms - diagnosis Ovary - pathology Pattern Recognition, Automated - methods Physical Sciences Principal components analysis Prostate Prostate - pathology Prostate cancer Prostatic Neoplasms - diagnosis Prostatic Neoplasms - genetics Proteomics Random sampling Reduction Representations Research and Analysis Methods Statistical sampling Colombia Cleveland Ohio United States > US
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0159088

Cover

More Information
Summary:	Medical diagnostics is often a multi-attribute problem, necessitating sophisticated tools for analyzing high-dimensional biomedical data. Mining this data often results in two crucial bottlenecks: 1) high dimensionality of features used to represent rich biological data and 2) small amounts of labelled training data due to the expense of consulting highly specific medical expertise necessary to assess each study. Currently, no approach that we are aware of has attempted to use active learning in the context of dimensionality reduction approaches for improving the construction of low dimensional representations. We present our novel methodology, AdDReSS (Adaptive Dimensionality Reduction with Semi-Supervision), to demonstrate that fewer labeled instances identified via AL in embedding space are needed for creating a more discriminative embedding representation compared to randomly selected instances. We tested our methodology on a wide variety of domains ranging from prostate gene expression, ovarian proteomic spectra, brain magnetic resonance imaging, and breast histopathology. Across these various high dimensional biomedical datasets with 100+ observations each and all parameters considered, the median classification accuracy across all experiments showed AdDReSS (88.7%) to outperform SSAGE, a SSDR method using random sampling (85.5%), and Graph Embedding (81.5%). Furthermore, we found that embeddings generated via AdDReSS achieved a mean 35.95% improvement in Raghavan efficiency, a measure of learning rate, over SSAGE. Our results demonstrate the value of AdDReSS to provide low dimensional representations of high dimensional biomedical data while achieving higher classification rates with fewer labelled examples as compared to without active learning.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Conceived and designed the experiments: GL. Performed the experiments: GL DR. Analyzed the data: GL. Contributed reagents/materials/analysis tools: GL DR. Wrote the paper: GL DR AM. Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0159088