Using variable combination population analysis for variable selection in multivariate calibration

[Display omitted] •Uses a novel sampling strategy that considers the interaction effect among variables.•Utilizes model population analysis to explore the useful information from a population of models.•Employs exponentially decreasing function to iteratively filter the variables and shrink the vari...

Full description

Saved in:

Bibliographic Details
Published in	Analytica chimica acta Vol. 862; pp. 14 - 23
Main Authors	Yun, Yong-Huan, Wang, Wei-Ting, Deng, Bai-Chuan, Lai, Guang-Bi, Liu, Xin-bo, Ren, Da-Bing, Liang, Yi-Zeng, Fan, Wei, Xu, Qing-Song
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 03.03.2015
Subjects	Algorithms analytical chemistry Calibration data collection evolution Exponentially decreasing function Internet least squares Least-Squares Analysis Model population analysis Models, Statistical Monte Carlo Method Multivariate Analysis Multivariate calibration Partial least squares selection methods Variable combination Variable selection wavelengths Variable combination Exponentially decreasing function Model population analysis Partial least squares Multivariate calibration Variable selection
Online Access	Get full text
ISSN	0003-2670 1873-4324 1873-4324
DOI	10.1016/j.aca.2014.12.048

Cover

More Information
Summary:	[Display omitted] •Uses a novel sampling strategy that considers the interaction effect among variables.•Utilizes model population analysis to explore the useful information from a population of models.•Employs exponentially decreasing function to iteratively filter the variables and shrink the variable space.•Performs well when compared with CARS, GA–PLS, MC-UVE-PLS and IRIV. Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of ‘survival of the fittest’ from Darwin’s natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm–partial least squares (GA–PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0003-2670 1873-4324 1873-4324
DOI:	10.1016/j.aca.2014.12.048