PAM clustering algorithm based on mutual information matrix for ATR-FTIR spectral feature selection and disease diagnosis

The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the am...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical research methodology Vol. 25; no. 1; pp. 225 - 13
Main Authors	Condino, Francesca, Crocco, Maria Caterina, Guzzi, Rita
Format	Journal Article
Language	English
Published	London BioMed Central 01.10.2025 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Algorithms Cluster Analysis Clustering Clustering Algorithms Data analysis Dependence Diagnosis Disease Diseases Dissimilarity measure Feature selection Fourier transform infrared spectroscopy Health Sciences High-dimensional statistics and omics data analysis Humans Italy Lipids Medical diagnosis Medical research Medicine Medicine & Public Health Medicine, Experimental Methods Multiple Sclerosis - diagnosis Neurological disorders Physiological aspects Principal components analysis Random variables Shannon entropy Spectroscopy, Fourier Transform Infrared - methods Spectrum analysis Statistical Theory and Methods Statistics for Life Sciences Theory of Medicine/Bioethics Italy Shannon entropy Feature selection Dissimilarity measure Clustering Dependence
Online Access	Get full text
ISSN	1471-2288 1471-2288
DOI	10.1186/s12874-025-02667-2

Cover

More Information
Summary:	The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2288 1471-2288
DOI:	10.1186/s12874-025-02667-2