Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines

We consider the problem of how to design and implement communication-efficient versions of parallel kernel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, i...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 28; no. 4; pp. 974 - 988
Main Authors You, Yang, Demmel, James, Czechowski, Kent, Song, Le, Vuduc, Rich
Format Journal Article
LanguageEnglish
Published New York IEEE 01.04.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1045-9219
1558-2183
DOI10.1109/TPDS.2016.2608823

Cover

More Information
Summary:We consider the problem of how to design and implement communication-efficient versions of parallel kernel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficiency of a state-of-the-art implementation scaled as W = Ω(P 3 ), where W is the problem size and P the number of processors; this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Ω(P 2 ). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM method that improves the isoefficiency to nearly W = Ω(P). We evaluate these methods on 96 to 1,536 processors, and show average speedups of 3 - 16x (7× on average) over Dis-SMO, and a 95 percent weak-scaling efficiency on six real-world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at [1].
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
USDOE Office of Science (SC)
SC0008700; SC0010200; AC02-05CH11231
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2016.2608823