Online Learning Under a Separable Stochastic Approximation Framework
We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than othe...
Saved in:
| Published in | IEEE transactions on pattern analysis and machine intelligence Vol. 47; no. 2; pp. 1317 - 1330 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
IEEE
01.02.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0162-8828 2160-9292 |
| DOI | 10.1109/TPAMI.2024.3495783 |
Cover
| Summary: | We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than others. This paper focuses on models where some parameters exhibit linear characteristics, which are common in machine learning applications. In our proposed algorithm, the linear parameters are updated using the recursive least squares (RLS) algorithm, akin to a stochastic Newton method. Subsequently, based on these updated linear parameters, the nonlinear parameters are adjusted using the stochastic gradient method (SGD). This dual-update mechanism can be viewed as a stochastic approximation variant of block coordinate gradient descent, where one subset of parameters is optimized using a second-order method while the other is handled with a first-order approach. We establish the global convergence of our online algorithm for non-convex cases in terms of the expected violation of first-order optimality conditions. Numerical experiments demonstrate that our method achieves significantly faster initial convergence and produces more robust performance compared to other popular learning algorithms. Additionally, our algorithm exhibits reduced sensitivity to learning rates and outperforms the recently proposed slimTrain algorithm (Newman et al. 2022). For validation, the code has been made available on GitHub. |
|---|---|
| ISSN: | 0162-8828 2160-9292 |
| DOI: | 10.1109/TPAMI.2024.3495783 |