A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization

The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of n local cost functions by using local information exchange is considered. This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/CAA journal of automatica sinica Vol. 9; no. 5; pp. 812 - 833
Main Authors	Yi, Xinlei, Zhang, Shengjun, Yang, Tao, Chai, Tianyou, Johansson, Karl Henrik
Format	Journal Article
Language	English
Published	Piscataway Chinese Association of Automation (CAA) 01.05.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Division of Decision and Control Systems,School of Electrical Engineering and Computer Science,KTH Royal Institute of Technology,and also affiliated with the Digital Futures,Stockholm 10044,Sweden%Department of Electrical Engineering,University of North Texas,Denton,TX 76203 USA%State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China
Subjects	Algorithms Collaborative work Communication networks Convergence Cost function Data exchange Deep learning Distributed nonconvex optimization Information exchange linear speedup Machine learning Machine learning algorithms Optimization Parallel processing Polyak-Lojasiewicz (P-L) condition Polyak-Łojasiewicz (P-Ł) condition primal-dual algorithm stochastic gradient descent Distributed nonconvex optimization Polyak-?ojasiewicz (P-?) condition primal-dual algorithm linear speedup stochastic gradient descent
Online Access	Get full text
ISSN	2329-9266 2329-9274
DOI	10.1109/JAS.2022.105554

Cover

More Information
Summary:	The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of n local cost functions by using local information exchange is considered. This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and federated learning. We propose a distributed primal-dual stochastic gradient descent (SGD) algorithm, suitable for arbitrarily connected communication networks and any smooth (possibly nonconvex) cost functions. We show that the proposed algorithm achieves the linear speedup convergence rate \mathcal{O}(1/\sqrt{nT}) for general nonconvex cost functions and the linear speedup convergence rate \mathcal{O}(1/(nT)) when the global cost function satisfies the Polyak-Łojasiewicz (P-Ł) condition, where T is the total number of iterations. We also show that the output of the proposed algorithm with constant parameters linearly converges to a neighborhood of a global optimum. We demonstrate through numerical experiments the efficiency of our algorithm in comparison with the baseline centralized SGD and recently proposed distributed SGD algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2329-9266 2329-9274
DOI:	10.1109/JAS.2022.105554