Novel Scalar-on-matrix Regression for Unbalanced Feature Matrices

Image features that characterize tubules from digitized kidney biopsies may offer insight into disease prognosis as novel biomarkers. For each subject, we can construct a matrix whose entries are a common set of image features (e.g., area, orientation, eccentricity) that are measured for each tubule...

Full description

Saved in:
Bibliographic Details
Published inStatistics in biosciences
Main Authors Rubin, Jeremy, Fan, Fan, Barisoni, Laura, Janowczyk, Andrew R., Zee, Jarcy
Format Journal Article
LanguageEnglish
Published United States 05.03.2025
Subjects
Online AccessGet full text
ISSN1867-1764
1867-1772
1867-1772
DOI10.1007/s12561-025-09476-7

Cover

More Information
Summary:Image features that characterize tubules from digitized kidney biopsies may offer insight into disease prognosis as novel biomarkers. For each subject, we can construct a matrix whose entries are a common set of image features (e.g., area, orientation, eccentricity) that are measured for each tubule from that subject’s biopsy. Previous scalar-on-matrix regression approaches which can predict scalar outcomes using image feature matrices cannot handle varying numbers of tubules across subjects. We propose the CLUstering Structured laSSO (CLUSSO), a novel scalar-on-matrix regression technique that allows for unbalanced numbers of tubules, to predict scalar outcomes from the image feature matrices. Through classifying tubules into one of two different clusters, CLUSSO averages and weights tubular feature values within-subject and within-cluster to create balanced feature matrices that can then be used with structured lasso regression. We develop the theoretical large tubule sample properties for the error bounds of the feature coefficient estimates. Simulation study results indicate that CLUSSO often achieves a lower false positive rate and higher true positive rate for identifying the image features which truly affect outcomes relative to a naive method that averages feature values across all tubules. Additionally, we find that CLUSSO has lower bias and can predict outcomes with a competitive accuracy to the naïve approach. Finally, we applied CLUSSO to tubular image features from kidney biopsies of glomerular disease subjects from the Nephrotic Syndrome Study Network (NEPTUNE) to predict kidney function and used subjects from the Cure Glomerulonephropathy (CureGN) study as an external validation set.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1867-1764
1867-1772
1867-1772
DOI:10.1007/s12561-025-09476-7