An Interpretable Deep Embedding Model for Few and Imbalanced Biomedical Data

In healthcare, training examples are usually hard to obtain (e.g., cases of a rare disease), or the cost of labelling data is high. With a large number of features (<inline-formula><tex-math notation="LaTeX">p</tex-math></inline-formula>) be measured in a relatively...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of biomedical and health informatics Vol. PP; pp. 1 - 8
Main Authors	Wang, Haishuai, Yang, Jianjun, Tao, Guangyu, Ma, Jiali, Chi, Lianhua, Wu, Jun, Zhao, Ziping
Format	Journal Article
Language	English
Published	United States IEEE 21.11.2022
Subjects	Data models Deep Embedding Model Deep learning Diseases Feature extraction Few Medical Data Imbalanced Medical Data Interpretable AI Medical diagnostic imaging Medical services Siamese Network Training
Online Access	Get full text
ISSN	2168-2194 2168-2208 2168-2208
DOI	10.1109/JBHI.2022.3223798

Cover

More Information
Summary:	In healthcare, training examples are usually hard to obtain (e.g., cases of a rare disease), or the cost of labelling data is high. With a large number of features (<inline-formula><tex-math notation="LaTeX">p</tex-math></inline-formula>) be measured in a relatively small number of samples (<inline-formula><tex-math notation="LaTeX">N</tex-math></inline-formula>), the "big <inline-formula><tex-math notation="LaTeX">p</tex-math></inline-formula>, small <inline-formula><tex-math notation="LaTeX">N</tex-math></inline-formula>" problem is an important subject in healthcare studies, especially on the genomic data. Another major challenge of effectively analyzing medical data is the skewed class distribution caused by the imbalance between different class labels. In addition, feature importance and interpretability play a crucial role in the success of solving medical problems. Therefore, in this paper, we present an interpretable deep embedding model (IDEM) to classify new data having seen only a few training examples with highly skewed class distribution. IDEM model consists of a feature attention layer to learn the informative features, a feature embedding layer to directly deal with both numerical and categorical features, a siamese network with contrastive loss to compare the similarity between learned embeddings of two input samples. Experiments on both synthetic data and real-world medical data demonstrate that our IDEM model has better generalization power than conventional approaches with few and imbalanced training medical samples, and it is able to identify which features contribute to the classifier in distinguishing case and control.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2194 2168-2208 2168-2208
DOI:	10.1109/JBHI.2022.3223798