PAM clustering algorithm based on mutual information matrix for ATR-FTIR spectral feature selection and disease diagnosis

The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the am...

Full description

Saved in:
Bibliographic Details
Published inBMC medical research methodology Vol. 25; no. 1; pp. 225 - 13
Main Authors Condino, Francesca, Crocco, Maria Caterina, Guzzi, Rita
Format Journal Article
LanguageEnglish
Published London BioMed Central 01.10.2025
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects
Online AccessGet full text
ISSN1471-2288
1471-2288
DOI10.1186/s12874-025-02667-2

Cover

More Information
Summary:The ATR-FTIR spectral data represent a valuable source of information in a wide range of pathologies, including neurological disorders, and can be used for disease discrimination. To this end, the identification of the potential spectral biomarkers among all possible candidates is needed, but the amount of information characterizing the spectral dataset and the presence of redundancy among data could make the selection of the more informative features cumbersome. Here, a novel approach is proposed to perform feature selection based on redundant information among spectral data. In particular, we consider the Partition Around Medoids algorithm based on a dissimilarity matrix obtained from mutual information measure, in order to obtain groups of variables (wavenumbers) having similar patterns of pairwise dependence. Indeed, an advantage of this grouping algorithm with respect to other more widely used clustering methods, is to facilitate the interpretation of results, since the centre of each cluster, the so-called medoid, corresponds to an observed data point. As a consequence, the obtained medoid can be considered as representative of the whole wavenumbers belonging to the same cluster and retained in the subsequent statistical methods for disease prediction. An application on real data is finally reported to show the ability of the proposed approach in discriminating between patients affected by multiple sclerosis and healthy subjects.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2288
1471-2288
DOI:10.1186/s12874-025-02667-2