Identifying Pareto-based solutions for regression subset selection via a feasible solution algorithm

The concept of Pareto optimality has been utilized in fields such as engineering and economics to understand fluid dynamics and consumer behavior. In machine learning contexts, Pareto-optimality has been used to identify tuning parameters that best optimize a set of m criteria (multi-objective optim...

Full description

Saved in:

Bibliographic Details
Published in	International journal of data science and analytics Vol. 10; no. 3; pp. 277 - 284
Main Authors	Lambert, Joshua W, Hawk, Gregory S
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.09.2020
Subjects	Applications Artificial Intelligence Business Information Systems Computational Biology/Bioinformatics Computer Science Data Mining and Knowledge Discovery Database Management Pareto Objective Multiple Feasible solution Regression Optimal Subset selection
Online Access	Get full text
ISSN	2364-415X 2364-4168
DOI	10.1007/s41060-020-00218-0

Cover

More Information
Summary:	The concept of Pareto optimality has been utilized in fields such as engineering and economics to understand fluid dynamics and consumer behavior. In machine learning contexts, Pareto-optimality has been used to identify tuning parameters that best optimize a set of m criteria (multi-objective optimization). During the process of regression model selection, data scientists are often concerned with choosing a model which has the best single criterion (e.g., Akaike information criterion ( AIC ) or R -squared ( R 2 )) before continuing to check a number of other regression model characteristics (e.g., model size, form, diagnostics, and interpretability). This strategy is multi-objective in nature but single objective in its numeric execution. This paper will first introduce a feasible solution algorithm (FSA) and explain how it can be applied to multi-objective problems for regression subset selection. Then we introduce the general framework of Pareto optimality within the regression setting. We then apply the algorithm in a simulation setting where we seek to estimate the first four Pareto boundaries for regression models using two model fit criteria. Finally, we present an application where we use a US communities and crime dataset.
ISSN:	2364-415X 2364-4168
DOI:	10.1007/s41060-020-00218-0