Computational and Statistical Guarantees for Tensor-on-Tensor Regression With Tensor Train Decomposition

Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regre...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 47; no. 11; pp. 10577 - 10587
Main Authors Qin, Zhen, Zhu, Zhihui
Format Journal Article
LanguageEnglish
Published United States IEEE 01.11.2025
Subjects
Online AccessGet full text
ISSN0162-8828
1939-3539
2160-9292
1939-3539
DOI10.1109/TPAMI.2025.3593840

Cover

More Information
Summary:Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order <inline-formula><tex-math notation="LaTeX">N+M</tex-math> <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhu-ieq1-3593840.gif"/> </inline-formula>. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD. Notably, compared to the IHT, which optimizes the entire tensor in each iteration while maintaining the TT structure through TT-SVD and poses a challenge for storage memory in practice, the RGD optimizes factors in the so-called left-orthogonal TT format, enforcing orthonormality among most of the factors, over the Stiefel manifold, thereby reducing the storage complexity of the IHT. However, this reduction in storage memory comes at a cost: the recovery of RGD is worse than that of IHT, while the error bounds of both algorithms depend on <inline-formula><tex-math notation="LaTeX">N+M</tex-math> <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhu-ieq2-3593840.gif"/> </inline-formula> polynomially. Experimental validation substantiates the validity of our theoretical findings.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0162-8828
1939-3539
2160-9292
1939-3539
DOI:10.1109/TPAMI.2025.3593840