Online Learning Under a Separable Stochastic Approximation Framework

We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than othe...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 47; no. 2; pp. 1317 - 1330
Main Authors	Gan, Min, Su, Xiang-xiang, Chen, Guang-yong, Chen, Jing, Chen, C. L. Philip
Format	Journal Article
Language	English
Published	IEEE 01.02.2025
Subjects	Approximation algorithms Artificial neural networks Convergence Convex functions Machine learning Machine learning algorithms Minimization Online learning Optimization recursive least squares stochastic approximation Stochastic processes Training variable projection
Online Access	Get full text
ISSN	0162-8828 2160-9292
DOI	10.1109/TPAMI.2024.3495783

Cover

More Information
Summary:	We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than others. This paper focuses on models where some parameters exhibit linear characteristics, which are common in machine learning applications. In our proposed algorithm, the linear parameters are updated using the recursive least squares (RLS) algorithm, akin to a stochastic Newton method. Subsequently, based on these updated linear parameters, the nonlinear parameters are adjusted using the stochastic gradient method (SGD). This dual-update mechanism can be viewed as a stochastic approximation variant of block coordinate gradient descent, where one subset of parameters is optimized using a second-order method while the other is handled with a first-order approach. We establish the global convergence of our online algorithm for non-convex cases in terms of the expected violation of first-order optimality conditions. Numerical experiments demonstrate that our method achieves significantly faster initial convergence and produces more robust performance compared to other popular learning algorithms. Additionally, our algorithm exhibits reduced sensitivity to learning rates and outperforms the recently proposed slimTrain algorithm (Newman et al. 2022). For validation, the code has been made available on GitHub.
ISSN:	0162-8828 2160-9292
DOI:	10.1109/TPAMI.2024.3495783