Online Learning Under a Separable Stochastic Approximation Framework

We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than othe...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 47; no. 2; pp. 1317 - 1330
Main Authors Gan, Min, Su, Xiang-xiang, Chen, Guang-yong, Chen, Jing, Chen, C. L. Philip
Format Journal Article
LanguageEnglish
Published IEEE 01.02.2025
Subjects
Online AccessGet full text
ISSN0162-8828
2160-9292
DOI10.1109/TPAMI.2024.3495783

Cover

More Information
Summary:We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than others. This paper focuses on models where some parameters exhibit linear characteristics, which are common in machine learning applications. In our proposed algorithm, the linear parameters are updated using the recursive least squares (RLS) algorithm, akin to a stochastic Newton method. Subsequently, based on these updated linear parameters, the nonlinear parameters are adjusted using the stochastic gradient method (SGD). This dual-update mechanism can be viewed as a stochastic approximation variant of block coordinate gradient descent, where one subset of parameters is optimized using a second-order method while the other is handled with a first-order approach. We establish the global convergence of our online algorithm for non-convex cases in terms of the expected violation of first-order optimality conditions. Numerical experiments demonstrate that our method achieves significantly faster initial convergence and produces more robust performance compared to other popular learning algorithms. Additionally, our algorithm exhibits reduced sensitivity to learning rates and outperforms the recently proposed slimTrain algorithm (Newman et al. 2022). For validation, the code has been made available on GitHub.
ISSN:0162-8828
2160-9292
DOI:10.1109/TPAMI.2024.3495783